API Reference¶
This reference provides detailed documentation for all modules, classes, and methods in the current release of Neurolearn.
nltools.data
: Data Types¶
- class nltools.data.Brain_Data(data=None, Y=None, X=None, mask=None, **kwargs)[source]¶
Brain_Data is a class to represent neuroimaging data in python as a vector rather than a 3-dimensional matrix.This makes it easier to perform data manipulation and analyses.
- Parameters:
data – nibabel data instance or list of files
Y – Pandas DataFrame of training labels
X – Pandas DataFrame Design Matrix for running univariate models
mask – binary nifiti file to mask brain data
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- align(target, method='procrustes', axis=0, *args, **kwargs)[source]¶
Align Brain_Data instance to target object using functional alignment
Alignment type can be hyperalignment or Shared Response Model. When using hyperalignment, target image can be another subject or an already estimated common model. When using SRM, target must be a previously estimated common model stored as a numpy array. Transformed data can be back projected to original data using Tranformation matrix.
See nltools.stats.align for aligning multiple Brain_Data instances
- Parameters:
target – (Brain_Data) object to align to.
method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
axis – (int) axis to align on
- Returns:
- (dict) a dictionary containing transformed object,
transformation matrix, and the shared response matrix
- Return type:
out
Examples
- Hyperalign using procrustes transform:
>>> out = data.align(target, method='procrustes')
- Align using shared response model:
>>> out = data.align(target, method='probabilistic_srm', n_features=None)
- Project aligned data into original data:
>>> original_data = np.dot(out[‘transformed’].data,out[‘transformation_matrix’].T)
- append(data, **kwargs)[source]¶
Append data to Brain_Data instance
- Parameters:
data – (Brain_Data) Brain_Data instance to append
kwargs – optional inputs to Design_Matrix append
- Returns:
(Brain_Data) new appended Brain_Data instance
- Return type:
out
- apply_mask(mask, resample_mask_to_brain=False)[source]¶
Mask Brain_Data instance
Note target data will be resampled into the same space as the mask. If you would like the mask resampled into the Brain_Data space, then set resample_mask_to_brain=True.
- Parameters:
mask – (Brain_Data or nifti object) mask to apply to Brain_Data object.
resample_mask_to_brain – (bool) Will resample mask to brain space before applying mask (default=False).
- Returns:
(Brain_Data) masked Brain_Data object
- Return type:
masked
- astype(dtype)[source]¶
Cast Brain_Data.data as type.
- Parameters:
dtype – datatype to convert
- Returns:
Brain_Data instance with new datatype
- Return type:
- bootstrap(function, n_samples=5000, save_weights=False, n_jobs=-1, random_state=None, *args, **kwargs)[source]¶
Bootstrap a Brain_Data method.
- Parameters:
function – (str) method to apply to data for each bootstrap
n_samples – (int) number of samples to bootstrap with replacement
save_weights – (bool) Save each bootstrap iteration (useful for aggregating
cluster) (many bootstraps on a) –
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all
CPUs.Returns –
- Returns:
summarized studentized bootstrap output
- Return type:
output
Examples
>>> b = dat.bootstrap('mean', n_samples=5000) >>> b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge') >>> b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
- decompose(algorithm='pca', axis='voxels', n_components=None, *args, **kwargs)[source]¶
Decompose Brain_Data object
- Parameters:
algorithm – (str) Algorithm to perform decomposition types=[‘pca’,’ica’,’nnmf’,’fa’,’dictionary’,’kernelpca’]
axis – dimension to decompose [‘voxels’,’images’]
n_components – (int) number of components. If None then retain as many as possible.
- Returns:
a dictionary of decomposition parameters
- Return type:
output
- detrend(method='linear')[source]¶
Remove linear trend from each voxel
- Parameters:
type – (‘linear’,’constant’, optional) type of detrending
- Returns:
(Brain_Data) detrended Brain_Data instance
- Return type:
out
- distance(metric='euclidean', **kwargs)[source]¶
Calculate distance between images within a Brain_Data() instance.
- Parameters:
metric – (str) type of distance metric (can use any scikit learn or sciypy metric)
- Returns:
(Adjacency) Outputs a 2D distance matrix.
- Return type:
dist
- extract_roi(mask, metric='mean', n_components=None)[source]¶
Extract activity from mask
- Parameters:
mask – (nifti) nibabel mask can be binary or numbered for different rois
metric – type of extraction method [‘mean’, ‘median’, ‘pca’], (default=mean) NOTE: Only mean currently works!
n_components – if metric=’pca’, number of components to return (takes any input into sklearn.Decomposition.PCA)
- Returns:
mean within each ROI across images
- Return type:
out
- filter(sampling_freq=None, high_pass=None, low_pass=None, **kwargs)[source]¶
Apply 5th order butterworth filter to data. Wraps nilearn functionality. Does not default to detrending and standardizing like nilearn implementation, but this can be overridden using kwargs.
- Parameters:
sampling_freq – sampling freq in hertz (i.e. 1 / TR)
high_pass – high pass cutoff frequency
low_pass – low pass cutoff frequency
kwargs – other keyword arguments to nilearn.signal.clean
- Returns:
Filtered Brain_Data instance
- Return type:
- find_spikes(global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶
Function to identify spikes from Time Series Data
- Parameters:
global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.
diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.
- Returns:
pandas dataframe with spikes as indicator variables
- icc(icc_type='icc2')[source]¶
- Calculate intraclass correlation coefficient for data within
Brain_Data class
ICC Formulas are based on: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1: x_ij = mu + beta_j + w_ij icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
- Parameters:
icc_type – type of icc to calculate (icc: voxel random effect, icc2: voxel and column random effect, icc3: voxel and column fixed effect)
- Returns:
(np.array) intraclass correlation coefficient
- Return type:
ICC
- iplot(threshold=0, surface=False, anatomical=None, **kwargs)[source]¶
Create an interactive brain viewer for the current brain data instance.
- Parameters:
threshold – (float/str) two-sided threshold to initialize the visualization, maybe be a percentile string; default 0
surface – (bool) whether to create a surface-based plot; default False
anatomical – nifti image or filename to overlay
kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
- Returns:
interactive brain viewer widget
- mean(axis=0)[source]¶
Get mean of each voxel or image
- Parameters:
axis – (int) across images=0 (default), within images=1
- Returns:
(float/np.array/Brain_Data)
- Return type:
out
- median(axis=0)[source]¶
Get median of each voxel or image
- Parameters:
axis – (int) across images=0 (default), within images=1
- Returns:
(float/np.array/Brain_Data)
- Return type:
out
- multivariate_similarity(images, method='ols')[source]¶
Predict spatial distribution of Brain_Data() instance from linear combination of other Brain_Data() instances or Nibabel images
- Parameters:
self – Brain_Data instance of data to be applied
images – Brain_Data instance of weight map
- Returns:
- dictionary of regression statistics in Brain_Data
instances {‘beta’,’t’,’p’,’df’,’residual’}
- Return type:
out
- plot(limit=5, anatomical=None, view='axial', colorbar=False, black_bg=True, draw_cross=False, threshold_upper=None, threshold_lower=None, figsize=(15, 2), axes=None, **kwargs)[source]¶
Create a quick plot of self.data. Will plot each image separately
- Parameters:
limit – (int) max number of images to return
anatomical – (nifti, str) nifti image or file name to overlay
view – (str) ‘axial’ for limit number of axial slices; ‘glass’ for ortho-view glass brain; ‘mni’ for multi-slice view mni brain; ‘full’ for both glass and mni views
threshold_upper – (str/float) threshold if view is ‘glass’, ‘mni’, or ‘full’
threshold_lower – (str/float)threshold if view is ‘glass’, ‘mni’, or ‘full’
save – (str/bool): optional string file name or path for saving; only applies if view is ‘mni’, ‘glass’, or ‘full’. Filenames will appended with the orientation they belong to
- predict(algorithm=None, cv_dict=None, plot=True, verbose=True, **kwargs)[source]¶
Run prediction
- Parameters:
algorithm – Algorithm to use for prediction. Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’
cv_dict – Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y
plot – Boolean indicating whether or not to create plots.
verbose (bool) – print performance; Default True
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- Returns:
a dictionary of prediction parameters
- Return type:
output
- predict_multi(algorithm=None, cv_dict=None, method='searchlight', rois=None, process_mask=None, radius=2.0, scoring=None, n_jobs=1, verbose=0, **kwargs)[source]¶
Perform multi-region prediction. This can be a searchlight analysis or multi-roi analysis if provided a Brain_Data instance with labeled non-overlapping rois.
- Parameters:
algorithm (string) – algorithm to use for prediction Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’
cv_dict – Type of cross_validation to use. Default is 3-fold. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y
method (string) – one of ‘searchlight’ or ‘roi’
rois (string/nltools.Brain_Data) – nifti file path or Brain_data instance containing non-overlapping regions-of-interest labeled by integers
process_mask (nib.Nifti1Image/nltools.Brain_Data) – mask to constrain where to perform analyses; only applied if method = ‘searchlight’
radius (float) – radius of searchlight in mm; default 2mm
scoring (function) – callable scoring function; see sklearn documentation; defaults to estimator’s default scoring function
n_jobs (int) – The number of CPUs to use to do permutation; default 1 because this can be very memory intensive
verbose (int) – whether parallelization progress should be printed; default 0
- Returns:
image of results
- Return type:
output
- randomise(n_permute=5000, threshold_dict=None, return_mask=False, **kwargs)[source]¶
Run mass-univariate regression at each voxel with inference performed via permutation testing ala randomise in FSL. Operates just like .regress(), but intended to be used for second-level analyses.
- Parameters:
n_permute (int) – number of permutations
threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}
return_mask – (bool) optionally return the thresholding mask
- Returns:
dictionary of maps for betas, tstats, and pvalues
- Return type:
out
- regions(min_region_size=1350, extract_type='local_regions', smoothing_fwhm=6, is_mask=False)[source]¶
Extract brain connected regions into separate regions.
- Parameters:
min_region_size (int) – Minimum volume in mm3 for a region to be kept.
extract_type (str) – Type of extraction method [‘connected_components’, ‘local_regions’]. If ‘connected_components’, each component/region in the image is extracted automatically by labelling each region based upon the presence of unique features in their respective regions. If ‘local_regions’, each component/region is extracted based on their maximum peak value to define a seed marker and then using random walker segementation algorithm on these markers for region separation.
smoothing_fwhm (scalar) – Smooth an image to extract more sparser regions. Only works for extract_type ‘local_regions’.
is_mask (bool) – Whether the Brain_Data instance should be treated as a boolean mask and if so, calls connected_label_regions instead.
- Returns:
Brain_Data instance with extracted ROIs as data.
- Return type:
- regress(mode='ols', **kwargs)[source]¶
Run a mass-univariate regression across voxels. Three types of regressions can be run: 1) Standard OLS (default) 2) Robust OLS (heteroscedasticty and/or auto-correlation robust errors), i.e. OLS with “sandwich estimators” 3) ARMA (auto-regressive and moving-average lags = 1 by default; experimental)
For more information see the help for nltools.stats.regress
ARMA notes: This experimental mode is similar to AFNI’s 3dREMLFit but without spatial smoothing of voxel auto-correlation estimates. It can be very computationally intensive so parallelization is used by default to try to speed things up. Speed is limited because a unique ARMA model is fit to each voxel (like AFNI/FSL), but unlike SPM, which assumes the same AR parameters (~0.2) at each voxel. While coefficient results are typically very similar to OLS, std-errors and so t-stats, dfs and and p-vals can differ greatly depending on how much auto-correlation is explaining the response in a voxel relative to other regressors in the design matrix.
- Parameters:
mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’
kwargs (dict) – keyword arguments to nltools.stats.regress
- Returns:
- dictionary of regression statistics in Brain_Data instances
{‘beta’,’t’,’p’,’df’,’residual’}
- Return type:
out
- scale(scale_val=100.0)[source]¶
- Scale all values such that they are on the range [0, scale_val],
via grand-mean scaling. This is NOT global-scaling/intensity normalization. This is useful for ensuring that data is on a common scale (e.g. good for multiple runs, participants, etc) and if the default value of 100 is used, can be interpreted as something akin to (but not exactly) “percent signal change.” This is consistent with default behavior in AFNI and SPM. Change this value to 10000 to make consistent with FSL.
- Parameters:
scale_val – (int/float) what value to send the grand-mean to; default 100
- similarity(image, method='correlation')[source]¶
Calculate similarity of Brain_Data() instance with single Brain_Data or Nibabel image
- Parameters:
image – (Brain_Data, nifti) image to evaluate similarity
method – (str) Type of similarity [‘correlation’,’dot_product’,’cosine’]
- Returns:
(list) Outputs a vector of pattern expression values
- Return type:
pexp
- smooth(fwhm)[source]¶
Apply spatial smoothing using nilearn smooth_img()
- Parameters:
fwhm – (float) full width half maximum of gaussian spatial filter
- Returns:
Brain_Data instance
- standardize(axis=0, method='center')[source]¶
Standardize Brain_Data() instance.
- Parameters:
axis – 0 for observations 1 for voxels
method – [‘center’,’zscore’]
- Returns:
Brain_Data Instance
- std(axis=0)[source]¶
Get standard deviation of each voxel or image.
- Parameters:
axis – (int) across images=0 (default), within images=1
- Returns:
(float/np.array/Brain_Data)
- Return type:
out
- temporal_resample(sampling_freq=None, target=None, target_type='hz')[source]¶
Resample Brain_Data timeseries to a new target frequency or number of samples using Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation. This function can up- or down-sample data.
Note: this function can use quite a bit of RAM.
- Parameters:
sampling_freq – (float) sampling frequency of data in hertz
target – (float) upsampling target
target_type – (str) type of target can be [samples,seconds,hz]
- Returns:
upsampled Brain_Data instance
- threshold(upper=None, lower=None, binarize=False, coerce_nan=True)[source]¶
- Threshold Brain_Data instance. Provide upper and lower values or
percentages to perform two-sided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every non-zero value.
- Parameters:
upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.
lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.
binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every non-zero value; default False
coerce_nan (bool) – coerce nan values to 0s; default True
- Returns:
Thresholded Brain_Data object.
- transform_pairwise()[source]¶
Extract brain connected regions into separate regions.
Args:
- Returns:
Brain_Data instance tranformed into pairwise comparisons
- Return type:
- ttest(threshold_dict=None, return_mask=False)[source]¶
Calculate one sample t-test across each voxel (two-sided)
- Parameters:
threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}
return_mask – (bool) if thresholding is requested, optionall return the mask of voxels that exceed threshold, e.g. for use with another map
- Returns:
- (dict) dictionary of regression statistics in Brain_Data
instances {‘t’,’p’}
- Return type:
out
- upload_neurovault(access_token=None, collection_name=None, collection_id=None, img_type=None, img_modality=None, **kwargs)[source]¶
- Upload Data to Neurovault. Will add any columns in self.X to image
metadata. Index will be used as image name.
- Parameters:
access_token – (str, Required) Neurovault api access token
collection_name – (str, Optional) name of new collection to create
collection_id – (int, Optional) neurovault collection_id if adding images to existing collection
img_type – (str, Required) Neurovault map_type
img_modality – (str, Required) Neurovault image modality
- Returns:
(pd.DataFrame) neurovault collection information
- Return type:
collection
- class nltools.data.Adjacency(data=None, Y=None, matrix_type=None, labels=None, **kwargs)[source]¶
Adjacency is a class to represent Adjacency matrices as a vector rather than a 2-dimensional matrix. This makes it easier to perform data manipulation and analyses.
- Parameters:
data – pandas data instance or list of files
matrix_type – (str) type of matrix. Possible values include: [‘distance’,’similarity’,’directed’,’distance_flat’, ‘similarity_flat’,’directed_flat’]
Y – Pandas DataFrame of training labels
**kwargs – Additional keyword arguments
- append(data)[source]¶
Append data to Adjacency instance
- Parameters:
data – (Adjacency) Adjacency instance to append
- Returns:
(Adjacency) new appended Adjacency instance
- Return type:
out
- bootstrap(function, n_samples=5000, save_weights=False, n_jobs=-1, random_state=None, *args, **kwargs)[source]¶
Bootstrap an Adjacency method.
- Parameters:
function – (str) method to apply to data for each bootstrap
n_samples – (int) number of samples to bootstrap with replacement
save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.Returns:
- Returns:
summarized studentized bootstrap output
Examples
>>> b = dat.bootstrap('mean', n_samples=5000) >>> b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge') >>> b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
- cluster_summary(clusters=None, metric='mean', summary='within')[source]¶
This function provides summaries of clusters within Adjacency matrices.
It can compute mean/median of within and between cluster values. Requires a list of cluster ids indicating the row/column of each cluster.
- Parameters:
clusters – (list) list of cluster labels
metric – (str) method to summarize mean or median. If ‘None” then return all r values
summary – (str) summarize within cluster or between clusters
- Returns:
(dict) within cluster means
- Return type:
dict
- distance(metric='correlation', **kwargs)[source]¶
Calculate distance between images within an Adjacency() instance.
- Parameters:
metric – (str) type of distance metric (can use any scikit learn or sciypy metric)
- Returns:
(Adjacency) Outputs a 2D distance matrix.
- Return type:
dist
- distance_to_similarity(metric='correlation', beta=1)[source]¶
Convert distance matrix to similarity matrix.
Note: currently only implemented for correlation and euclidean.
- Parameters:
metric – (str) Can only be correlation or euclidean
beta – (float) parameter to scale exponential function (default: 1) for euclidean
- Returns:
(Adjacency) Adjacency object
- Return type:
out
- generate_permutations(n_perm, random_state=None)[source]¶
Generate n_perm permutated versions of Adjacency in a lazy fashion. Useful for iterating against.
- Parameters:
n_perm (int) – number of permutations
random_state (int, np.random.seed, optional) – random seed for reproducibility. Defaults to None.
Examples
>>> for perm in adj.generate_permutations(1000): >>> out = neural_distance_mat.similarity(perm) >>> ...
- Yields:
Adjacency – permuted version of self
- isc(n_samples=5000, metric='median', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute intersubject correlation.
This implementation uses the subject-wise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). As recommended by Chen et al., 2016, we compute the median pairwise ISC by default. However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values. We compute the p-values using the percentile method using the same method in Brainiak.
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.
- Parameters:
n_bootstraps – (int) number of bootstraps
metric – (str) type of association metric [‘spearman’,’pearson’,’kendall’]
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
- Return type:
stats
- isc_group(group, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute intersubject correlation differences between groups.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.
There are currently two different methods to compute p-values. By default, we use the subject-wise permutation method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both within and between the groups. Then the group labels are permuted and the mean difference between the two groups are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group and the ISC difference between these groups is used to generate a null distribution. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method (Hall & Wilson, 1991).
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.
- Parameters:
group – (np.array) vector of group ids corresponding to subject data in Adjacency instance
n_samples – (int) number of samples for permutation or bootstrapping
metric – (str) type of isc summary metric [‘mean’,’median’]
method – (str) method to compute p-values [‘permute’, ‘bootstrap’] (default: permute)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_null – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
- Return type:
stats
- mean(axis=0)[source]¶
Calculate mean of Adjacency
- Parameters:
axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.
- Returns:
- float if single, adjacency if axis=0, np.array if axis=1
and multiple
- Return type:
mean
- median(axis=0)[source]¶
Calculate median of Adjacency
- Parameters:
axis – (int) calculate median over features (0) or data (1). For data it will be on upper triangle.
- Returns:
- float if single, adjacency if axis=0, np.array if axis=1
and multiple
- Return type:
mean
- plot(limit=3, axes=None, *args, **kwargs)[source]¶
Create Heatmap of Adjacency Matrix
Can pass in any sns.heatmap argument
- Parameters:
limit – (int) number of heatmaps to plot if object contains multiple adjacencies (default: 3)
axes – matplotlib axis handle
- plot_label_distance(labels=None, ax=None)[source]¶
Create a violin plot indicating within and between label distance
- Parameters:
labels (np.array) – numpy array of labels to plot
- Returns:
violin plot handles
- Return type:
f
- plot_mds(n_components=2, metric=True, labels=None, labels_color=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_jobs=-1, view=(30, 20), figsize=[12, 8], ax=None, *args, **kwargs)[source]¶
Plot Multidimensional Scaling
- Parameters:
n_components – (int) Number of dimensions to project (can be 2 or 3)
metric – (bool) Perform metric or non-metric dimensional scaling; default
labels – (list) Can override labels stored in Adjacency Class
labels_color – (str) list of colors for labels, if len(1) then make all same color
n_jobs – (int) Number of parallel jobs
view – (tuple) view for 3-Dimensional plot; default (30,20)
- plot_silhouette(labels=None, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶
Create a silhouette plot
- regress(X, mode='ols', **kwargs)[source]¶
Run a regression on an adjacency instance. You can decompose an adjacency instance with another adjacency instance. You can also decompose each pixel by passing a design_matrix instance.
- Parameters:
X – Design matrix can be an Adjacency or Design_Matrix instance
method – type of regression (default: ols)
- Returns:
(dict) dictionary of stats outputs.
- Return type:
stats
- similarity(data, plot=False, perm_type='2d', n_permute=5000, metric='spearman', ignore_diagonal=False, **kwargs)[source]¶
Calculate similarity between two Adjacency matrices. Default is to use spearman correlation and permutation test.
- Parameters:
data (Adjacency or array) – Adjacency data, or 1-d array same size as self.data
perm_type – (str) ‘1d’,’2d’, or None
metric – (str) ‘spearman’,’pearson’,’kendall’
ignore_diagonal – (bool) only applies to ‘directed’ Adjacency types using
perm_type='1d' (perm_type=None or) –
- social_relations_model(summarize_results=True, nan_replace=True)[source]¶
Estimate the social relations model from a matrix for a round-robin design.
X_{ij} = m + lpha_i + eta_j + g_{ij} + epsilon_{ijl}
where X_{ij} is the score for person i rating person j, m is the group mean, lpha_i is person i’s actor effect, eta_j is person j’s partner effect, g_{ij} is the relationship effect and epsilon_{ijl} is the error in measure l for actor i and partner j.
This model is primarily concerned with partioning the variance of the various effects.
Code is based on implementation presented in Chapter 8 of Kenny, Kashy, & Cook (2006). Tests replicate examples presented in the book. Note, that this method assumes that actor scores are rows (lower triangle), while partner scores are columnns (upper triangle). The minimal sample size to estimate these effects is 4.
- Model Assumptions:
Social interactions are exclusively dyadic
People are randomly sampled from population
No order effects
The effects combine additively and relationships are linear
In the future we might update the formulas and standard errors based on Bond and Lashley, 1996
- Parameters:
self – (adjacency) can be a single matrix or many matrices for each group
summarize_results – (bool) will provide a formatted summary of model results
nan_replace – (bool) will replace nan values with row and column means
- Returns:
(pd.Series/pd.DataFrame) All of the effects estimated using SRM
- Return type:
estimated effects
- stats_label_distance(labels=None, n_permute=5000, n_jobs=-1)[source]¶
Calculate permutation tests on within and between label distance.
- Parameters:
labels (np.array) – numpy array of labels to plot
n_permute (int) – number of permutations to run (default=5000)
- Returns:
- dictionary of within and between group differences
and p-values
- Return type:
dict
- std(axis=0)[source]¶
Calculate standard deviation of Adjacency
- Parameters:
axis – (int) calculate std over features (0) or data (1). For data it will be on upper triangle.
- Returns:
- float if single, adjacency if axis=0, np.array if axis=1 and
multiple
- Return type:
std
- sum(axis=0)[source]¶
Calculate sum of Adjacency
- Parameters:
axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.
- Returns:
- float if single, adjacency if axis=0, np.array if axis=1
and multiple
- Return type:
mean
- threshold(upper=None, lower=None, binarize=False)[source]¶
- Threshold Adjacency instance. Provide upper and lower values or
percentages to perform two-sided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every non-zero value.
- Parameters:
upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.
lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.
binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every non-zero value; default False
- Returns:
thresholded Adjacency instance
- Return type:
- ttest(permutation=False, **kwargs)[source]¶
Calculate ttest across samples.
- Parameters:
permutation – (bool) Run ttest as permutation. Note this can be very slow.
- Returns:
- (dict) contains Adjacency instances of t values (or mean if
running permutation) and Adjacency instance of p values.
- Return type:
out
- class nltools.data.Design_Matrix(*args, **kwargs)[source]¶
Design_Matrix is a class to represent design matrices with special methods for data processing (e.g. convolution, upsampling, downsampling) and also intelligent and flexible and intelligent appending (e.g. auto-matically keep certain columns or polynomial terms separated during concatentation). It plays nicely with Brain_Data and can be used to build an experimental design to pass to Brain_Data’s X attribute. It is essentially an enhanced pandas df, with extra attributes and methods. Methods always return a new design matrix instance (copy). Column names are always string types.
- Parameters:
sampling_freq (float) – sampling rate of each row in hertz; To covert seconds to hertz (e.g. in the case of TRs for neuroimaging) using hertz = 1 / TR
convolved (list, optional) – on what columns convolution has been performed; defaults to None
polys (list, optional) – list of polynomial terms in design matrix, e.g. intercept, polynomial trends, basis functions, etc; default None
- add_dct_basis(duration=180, drop=0)[source]¶
Adds unit scaled cosine basis functions to Design_Matrix columns, based on spm-style discrete cosine transform for use in high-pass filtering. Does not add intercept/constant. Care is recommended if using this along with .add_poly(), as some columns will be highly-correlated.
- Parameters:
duration (int) – length of filter in seconds
drop (int) – index of which early/slow bases to drop if any; will always drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2); default None
- add_poly(order=0, include_lower=True)[source]¶
Add nth order Legendre polynomial terms as columns to design matrix. Good for adding constant/intercept to model (order = 0) and accounting for slow-frequency nuisance artifacts e.g. linear, quadratic, etc drifts. Care is recommended when using this with .add_dct_basis() as some columns will be highly correlated.
- Parameters:
order (int) – what order terms to add; 0 = constant/intercept (default), 1 = linear, 2 = quadratic, etc
include_lower – (bool) whether to add lower order terms if order > 0
- append(dm, axis=0, keep_separate=True, unique_cols=None, fill_na=0, verbose=False)[source]¶
Method for concatenating another design matrix row or column-wise. When concatenating row-wise, has the ability to keep certain columns separated if they exist in multiple design matrices (e.g. keeping separate intercepts for multiple runs). This is on by default and will automatically separate out polynomial columns (i.e. anything added with the add_poly or add_dct_basis methods). Additional columns can be separate by run using the unique_cols parameter. Can also add new polynomial terms during vertical concatentation (when axis == 0). This will by default create new polynomial terms separately for each design matrix
- Parameters:
dm (Design_Matrix or list) – design_matrix or list of design_matrices to append
axis (int) – 0 for row-wise (vert-cat), 1 for column-wise (horz-cat); default 0
keep_separate (bool,optional) – whether try and uniquify columns; defaults to True; only applies when axis==0
unique_cols (list,optional) – what additional columns to try to keep separated by uniquifying, only applies when axis = 0; defaults to None
fill_na (str/int/float) – if provided will fill NaNs with this value during row-wise appending (when axis = 0) if separate columns are desired; default 0
verbose (bool) – print messages during append about how polynomials are going to be separated
- clean(fill_na=0, exclude_polys=False, thresh=0.95, verbose=True)[source]¶
Method to fill NaNs in Design Matrix and remove duplicate columns based on data values, NOT names. Columns are dropped if they are correlated >= the requested threshold (default = .95). In this case, only the first instance of that column will be retained and all others will be dropped.
- Parameters:
fill_na (str/int/float) – value to fill NaNs with set to None to retain NaNs; default 0
exclude_polys (bool) – whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default False
thresh (float) – correlation threshold to use to drop redundant columns; default .95
verbose (bool) – print what column names were dropped; default True
- convolve(conv_func='hrf', columns=None)[source]¶
Perform convolution using an arbitrary function.
- Parameters:
conv_func (ndarray or string) – either a 1d numpy array containing output of a function that you want to convolve; a samples by kernel 2d array of several kernels to convolve; or the string ‘hrf’ which defaults to a glover HRF function at the Design_matrix’s sampling_freq
columns (list) – what columns to perform convolution on; defaults to all non-polynomial columns
- downsample(target, **kwargs)[source]¶
Downsample columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.
- Parameters:
target (float) – desired frequency in hz
kwargs – additional inputs to nltools.stats.downsample
- heatmap(figsize=(8, 6), **kwargs)[source]¶
Visualize Design Matrix spm style. Use .plot() for typical pandas plotting functionality. Can pass optional keyword args to seaborn heatmap.
- replace_data(data, column_names=None)[source]¶
Convenient method to replace all data in Design_Matrix with new data while keeping attributes and polynomial columns untouched.
- Parameters:
columns_names (list) – list of columns names for new data
- upsample(target, **kwargs)[source]¶
Upsample columns of design matrix. Relies on nltools.stats.upsample, but ensures that returned object is a design matrix.
- Parameters:
target (float) – desired frequence in hz
kwargs – additional inputs to nltools.stats.downsample
- vif(exclude_polys=True)[source]¶
Compute variance inflation factor amongst columns of design matrix,ignoring polynomial terms. Much faster that statsmodels and more reliable too. Uses the same method as Matlab and R (diagonal elements of the inverted correlation matrix).
- Parameters:
exclude_polys (bool) – whether to skip checking of polynomial terms (i.e intercept, trends, basis functions); default True
- Returns:
list with length == number of columns - intercept
- Return type:
vifs (list)
nltools.analysis
: Analysis Tools¶
- class nltools.analysis.Roc(input_values=None, binary_outcome=None, threshold_type='optimal_overall', forced_choice=None, **kwargs)[source]¶
Roc Class
The Roc class is based on Tor Wager’s Matlab roc_plot.m function and allows a user to easily run different types of receiver operator characteristic curves. For example, one might be interested in single interval or forced choice.
- Parameters:
input_values – nibabel data instance
binary_outcome – vector of training labels
threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- calculate(input_values=None, binary_outcome=None, criterion_values=None, threshold_type='optimal_overall', forced_choice=None, balanced_acc=False)[source]¶
Calculate Receiver Operating Characteristic plot (ROC) for single-interval classification.
- Parameters:
input_values – nibabel data instance
binary_outcome – vector of training labels
criterion_values – (optional) criterion values for calculating fpr & tpr
threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]
forced_choice – index indicating position for each unique subject (default=None)
balanced_acc – balanced accuracy for single-interval classification (bool). THIS IS NOT COMPLETELY IMPLEMENTED BECAUSE IT AFFECTS ACCURACY ESTIMATES, BUT NOT P-VALUES OR THRESHOLD AT WHICH TO EVALUATE SENS/SPEC
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- plot(plot_method='gaussian', balanced_acc=False, **kwargs)[source]¶
Create ROC Plot
Create a specific kind of ROC curve plot, based on input values along a continuous distribution and a binary outcome variable (logical)
- Parameters:
plot_method – type of plot [‘gaussian’,’observed’]
binary_outcome – vector of training labels
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- Returns:
fig
nltools.stats
: Stats Tools¶
NeuroLearn Statistics Tools¶
Tools to help with statistical analyses.
- nltools.stats.align(data, method='deterministic_srm', n_features=None, axis=0, *args, **kwargs)[source]¶
Align subject data into a common response model.
Can be used to hyperalign source data to target data using Hyperalignment from Dartmouth (i.e., procrustes transformation; see nltools.stats.procrustes) or Shared Response Model from Princeton (see nltools.external.srm). (see nltools.data.Brain_Data.align for aligning a single Brain object to another). Common Model is shared response model or centered target data. Transformed data can be back projected to original data using Tranformation matrix. Inputs must be a list of Brain_Data instances or numpy arrays (observations by features).
- Parameters:
data – (list) A list of Brain_Data objects
method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
n_features – (int) number of features to align to common space. If None then will select number of voxels
axis – (int) axis to align on
- Returns:
- (dict) a dictionary containing a list of transformed subject
matrices, a list of transformation matrices, the shared response matrix, and the intersubject correlation of the shared resposnes
- Return type:
out
Examples
- Hyperalign using procrustes transform:
>>> out = align(data, method='procrustes')
- Align using shared response model:
>>> out = align(data, method='probabilistic_srm', n_features=None)
- Project aligned data into original data:
>>> original_data = [np.dot(t.data,tm.T) for t,tm in zip(out['transformed'], out['transformation_matrix'])]
- nltools.stats.align_states(reference, target, metric='correlation', return_index=False, replace_zero_variance=False)[source]¶
Align state weight maps using hungarian algorithm by minimizing pairwise distance between group states.
- Parameters:
reference – (np.array) reference pattern x state matrix
target – (np.array) target pattern x state matrix to align to reference
metric – (str) distance metric to use
return_index – (bool) return index if True, return remapped data if False
replace_zero_variance – (bool) transform a vector with zero variance to random numbers from a uniform distribution. Useful for when using correlation as a distance metric to avoid NaNs.
- Returns:
(list) a list of reordered state X pattern matrices
- Return type:
ordered_weights
- nltools.stats.calc_bpm(beat_interval, sampling_freq)[source]¶
Calculate instantaneous BPM from beat to beat interval
- Parameters:
beat_interval – (int) number of samples in between each beat (typically R-R Interval)
sampling_freq – (float) sampling frequency in Hz
- Returns:
(float) beats per minute for time interval
- Return type:
bpm
- nltools.stats.correlation(data1, data2, metric='pearson')[source]¶
This function calculates the correlation between data1 and data2
- Parameters:
data1 – (np.array) x
data2 – (np.array) y
metric – (str) type of correlation [“spearman” or “pearson” or “kendall”]
- Returns:
(np.array) correlations p: (float) p-value
- Return type:
r
- nltools.stats.correlation_permutation(data1, data2, method='permute', n_permute=5000, metric='spearman', tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
Compute correlation and calculate p-value using permutation methods.
‘permute’ method randomly shuffles one of the vectors. This method is recommended for independent data. For timeseries data we recommend using ‘circle_shift’ or ‘phase_randomize’ methods.
- Parameters:
data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
n_permute – (int) number of permutations
metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
method – (str) type of permutation [‘permute’, ‘circle_shift’, ‘phase_randomize’]
random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
- Return type:
stats
- nltools.stats.distance_correlation(x, y, bias_corrected=True, ttest=False)[source]¶
Compute the distance correlation betwen 2 arrays to test for multivariate dependence (linear or non-linear). Arrays must match on their first dimension. It’s almost always preferable to compute the bias_corrected version which can also optionally perform a ttest. This ttest operates on a statistic thats ~dcorr^2 and will be also returned.
Explanation: Distance correlation involves computing the normalized covariance of two centered euclidean distance matrices. Each distance matrix is the euclidean distance between rows (if x or y are 2d) or scalars (if x or y are 1d). Each matrix is centered prior to computing the covariance either using double-centering or u-centering, which corrects for bias as the number of dimensions increases. U-centering is almost always preferred in all cases. It also permits inference of the normalized covariance between each distance matrix using a one-tailed directional t-test. (Szekely & Rizzo, 2013). While distance correlation is normally bounded between 0 and 1, u-centering can produce negative estimates, which are never significant.
Validated against the dcor and dcor.ttest functions in the ‘energy’ R package and the dcor.distance_correlation, dcor.udistance_correlation_sqr, and dcor.independence.distance_correlation_t_test functions in the dcor Python package.
- Parameters:
x (ndarray) – 1d or 2d numpy array of observations by features
y (ndarry) – 1d or 2d numpy array of observations by features
bias_corrected (bool) – if false use double-centering which produces a biased-estimate that converges to 1 as the number of dimensions increase. Otherwise used u-centering to correct this bias. Note this must be True if ttest=True; default True
ttest (bool) – perform a ttest using the bias_corrected distance correlation; default False
- Returns:
dictionary of results (correlation, t, p, and df.) Optionally, covariance, x variance, and y variance
- Return type:
results (dict)
- nltools.stats.double_center(mat)[source]¶
Double center a 2d array.
- Parameters:
mat (ndarray) – 2d numpy array
- Returns:
double-centered version of input
- Return type:
mat (ndarray)
- nltools.stats.downsample(data, sampling_freq=None, target=None, target_type='samples', method='mean')[source]¶
Downsample pandas to a new target frequency or number of samples using averaging.
- Parameters:
data – (pd.DataFrame, pd.Series) data to downsample
sampling_freq – (float) Sampling frequency of data in hertz
target – (float) downsampling target
target_type – type of target can be [samples,seconds,hz]
method – (str) type of downsample method [‘mean’,’median’], default: mean
- Returns:
(pd.DataFrame, pd.Series) downsmapled data
- Return type:
out
- nltools.stats.fdr(p, q=0.05)[source]¶
Determine FDR threshold given a p value array and desired false discovery rate q. Written by Tal Yarkoni
- Parameters:
p – (np.array) vector of p-values
q – (float) false discovery rate level
- Returns:
- (float) p-value threshold based on independence or positive
dependence
- Return type:
fdr_p
- nltools.stats.find_spikes(data, global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶
Function to identify spikes from fMRI Time Series Data
- Parameters:
data – Brain_Data or nibabel instance
global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.
diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.
- Returns:
pandas dataframe with spikes as indicator variables
- nltools.stats.holm_bonf(p, alpha=0.05)[source]¶
Compute corrected p-values based on the Holm-Bonferroni method, i.e. step-down procedure applying iteratively less correction to highest p-values. A bit more conservative than fdr, but much more powerful thanvanilla bonferroni.
- Parameters:
p – (np.array) vector of p-values
alpha – (float) alpha level
- Returns:
- (float) p-value threshold based on bonferroni
step-down procedure
- Return type:
bonf_p
- nltools.stats.isc(data, n_samples=5000, metric='median', method='bootstrap', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute pairwise intersubject correlation from observations by subjects array.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.
There are currently three different methods to compute p-values. These include the classic methods for computing permuted time-series by either circle-shifting the data or phase-randomizing the data (see Lancaster et al., 2018). These methods create random surrogate data while preserving the temporal autocorrelation inherent to the signal. By default, we use the subject-wise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method using the same method in Brainiak.
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.
Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018). Surrogate data for hypothesis testing of physical systems. Physics Reports, 748, 1-60.
- Parameters:
data – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
n_samples – (int) number of random samples/bootstraps
metric – (str) type of isc summary metric [‘mean’,’median’]
method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_null – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
- Return type:
stats
- nltools.stats.isc_group(group1, group2, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute difference in intersubject correlation between groups.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.
There are currently two different methods to compute p-values. By default, we use the subject-wise permutation method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both within and between the groups. Then the group labels are permuted and the mean difference between the two groups are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group and the ISC difference between these groups is used to generate a null distribution. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method (Hall & Wilson, 1991).
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.
- Parameters:
group1 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
group2 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
n_samples – (int) number of samples for permutation or bootstrapping
metric – (str) type of isc summary metric [‘mean’,’median’]
method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_null – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
- Return type:
stats
- nltools.stats.isfc(data, method='average')[source]¶
Compute intersubject functional connectivity (ISFC) from a list of observation x feature matrices
This function uses the leave one out approach to compute ISFC (Simony et al., 2016). For each subject, compute the cross-correlation between each voxel/roi with the average of the rest of the subjects data. In other words, compute the mean voxel/ROI response for all participants except the target subject. Then compute the correlation between each ROI within the target subject with the mean ROI response in the group average.
Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature communications, 7, 12141.
- Parameters:
data – list of subject matrices (observations x voxels/rois)
method – approach to computing ISFC. ‘average’ uses leave one
- Returns:
list of subject ISFC matrices
- nltools.stats.isps(data, sampling_freq=0.5, low_cut=0.04, high_cut=0.07, order=5, pairwise=False)[source]¶
Compute Dynamic Intersubject Phase Synchrony (ISPS from a observation by subject array)
This function computes the instantaneous intersubject phase synchrony for a single voxel/roi timeseries. Requires multiple subjects. This method is largely based on that described by Glerean et al., 2012 and performs a hilbert transform on narrow bandpass filtered timeseries (butterworth) data to get the instantaneous phase angle. The function returns a dictionary containing the average phase angle, the average vector length, and parametric p-values computed using the rayleigh test using circular statistics (Fisher, 1993). If pairwise=True, then it will compute these on the pairwise phase angle differences, if pairwise=False, it will compute these on the actual phase angles. This is called inter-site phase coupling or inter-trial phase coupling respectively in the EEG literatures.
This function requires narrow band filtering your data. As a default we use the recommendations by (Glerean et al., 2012) of .04-.07Hz. This is similar to the “slow-4” band (0.025–0.067 Hz) described by (Zuo et al., 2010; Penttonen & Buzsáki, 2003), but excludes the .03 band, which has been demonstrated to contain aliased respiration signals (Birn, 2006).
Birn RM, Smith MA, Bandettini PA, Diamond JB. 2006. Separating respiratory-variation-related fluctuations from neuronal-activity- related fluctuations in fMRI. Neuroimage 31:1536–1548.
Buzsáki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science, 304(5679), 1926-1929.
Fisher, N. I. (1995). Statistical analysis of circular data. cambridge university press.
Glerean, E., Salmi, J., Lahnakoski, J. M., Jääskeläinen, I. P., & Sams, M. (2012). Functional magnetic resonance imaging phase synchronization as a measure of dynamic functional connectivity. Brain connectivity, 2(2), 91-101.
- Parameters:
data – (pd.DataFrame, np.ndarray) observations x subjects data
sampling_freq – (float) sampling freqency of data in Hz
low_cut – (float) lower bound cutoff for high pass filter
high_cut – (float) upper bound cutoff for low pass filter
order – (int) filter order for butterworth bandpass
pairwise – (bool) compute phase angle coherence on pairwise phase angle differences or on raw phase angle.
- Returns:
dictionary with mean phase angle, vector length, and rayleigh statistic
- nltools.stats.make_cosine_basis(nsamples, sampling_freq, filter_length, unit_scale=True, drop=0)[source]¶
- Create a series of cosine basis functions for a discrete cosine
transform. Based off of implementation in spm_filter and spm_dctmtx because scipy dct can only apply transforms but not return the basis functions. Like SPM, does not add constant (i.e. intercept), but does retain first basis (i.e. sigmoidal/linear drift)
- Parameters:
nsamples (int) – number of observations (e.g. TRs)
sampling_freq (float) – sampling frequency in hertz (i.e. 1 / TR)
filter_length (int) – length of filter in seconds
unit_scale (true) – assure that the basis functions are on the normalized range [-1, 1]; default True
drop (int) – index of which early/slow bases to drop if any; default is to drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2)
- Returns:
nsamples x number of basis sets numpy array
- Return type:
out (ndarray)
- nltools.stats.matrix_permutation(data1, data2, n_permute=5000, metric='spearman', how='upper', include_diag=False, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
Permute 2-dimensional matrix correlation (mantel test).
Chen, G. et al. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. Neuroimage, 142, 248-259.
- Parameters:
data1 – (pd.DataFrame, np.array) square matrix
data2 – (pd.DataFrame, np.array) square matrix
n_permute – (int) number of permutations
metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
how – (str) whether to use the ‘upper’ (default), ‘lower’, or ‘full’ matrix. The default of ‘upper’ assumes both matrices are symmetric
include_diag (bool) – only applies if how=’full’. Whether to include the diagonal elements in the comparison
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
- Return type:
stats
- nltools.stats.multi_threshold(t_map, p_map, thresh)[source]¶
Threshold test image by multiple p-value from p image
- Parameters:
stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)
p – (Brain_Data) Brain_data instance of p-values
threshold – (list) list of p-values to threshold stat image
- Returns:
Thresholded Brain_Data instance
- Return type:
out
- nltools.stats.one_sample_permutation(data, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
One sample permutation test using randomization.
- Parameters:
data – (pd.DataFrame, pd.Series, np.array) data to permute
n_permute – (int) number of permutations
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)
- Returns:
(dict) dictionary of permutation results [‘mean’,’p’]
- Return type:
stats
- nltools.stats.pearson(x, y)[source]¶
Correlates row vector x with each row vector in 2D array y. From neurosynth.stats.py - author: Tal Yarkoni
- nltools.stats.procrustes(data1, data2)[source]¶
Procrustes analysis, a similarity test for two data sets.
Each input matrix is a set of points or vectors (the rows of the matrix). The dimension of the space is the number of columns of each matrix. Given two identically sized matrices, procrustes standardizes both such that: - \(tr(AA^{T}) = 1\). - Both sets of points are centered around the origin. Procrustes ([1]_, [2]_) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize \(M^{2}=\sum(data1-data2)^{2}\), or the sum of the squares of the pointwise differences between the two input datasets. This function was not designed to handle datasets with different numbers of datapoints (rows). If two data sets have different dimensionality (different number of columns), this function will add columns of zeros to the smaller of the two.
- Parameters:
data1 – array_like Matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 (must have >1 unique points).
data2 – array_like n rows of data in k space to be fit to data1. Must be the same shape
(numrows, numcols)
as data1 (must have >1 unique points).
- Returns:
- array_like
A standardized version of data1.
- mtx2array_like
The orientation of data2 that best fits data1. Centered, but not necessarily \(tr(AA^{T}) = 1\).
- disparityfloat
\(M^{2}\) as defined above.
- R(N, N) ndarray
The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of dot(data1, R) - data2, subject to dot(R.T, R) == I.
- scalefloat
Sum of the singular values of
dot(data1.T, data2)
.
- Return type:
mtx1
- nltools.stats.procrustes_distance(mat1, mat2, n_permute=5000, tail=2, n_jobs=-1, random_state=None)[source]¶
Use procrustes super-position to perform a similarity test between 2 matrices. Matrices need to match in size on their first dimension only, as the smaller matrix on the second dimension will be padded with zeros. After aligning two matrices using the procrustes transformation, use the computed disparity between them (sum of squared error of elements) as a similarity metric. Shuffle the rows of one of the matrices and recompute the disparity to perform inference (Peres-Neto & Jackson, 2001).
- Parameters:
mat1 (ndarray) – 2d numpy array; must have same number of rows as mat2
mat2 (ndarray) – 1d or 2d numpy array; must have same number of rows as mat1
n_permute (int) – number of permutation iterations to perform
tail (int) – either 1 for one-tailed or 2 for two-tailed test; default 2
n_jobs (int) – The number of CPUs to use to do permutation; default -1 (all)
- Returns:
similarity between matrices bounded between 0 and 1 pval (float): permuted p-value
- Return type:
similarity (float)
- nltools.stats.regress(X, Y, mode='ols', stats='full', **kwargs)[source]¶
This is a flexible function to run several types of regression models provided X and Y numpy arrays. Y can be a 1d numpy array or 2d numpy array. In the latter case, results will be output with shape 1 x Y.shape[1], in other words fitting a separate regression model to each column of Y.
Does NOT add an intercept automatically to the X matrix before fitting like some other software packages. This is left up to the user.
This function can compute regression in 3 ways:
Standard OLS
OLS with robust sandwich estimators for standard errors. 3 robust types of estimators exist:
‘hc0’ - classic huber-white estimator robust to heteroscedasticity (default)
‘hc3’ - a variant on huber-white estimator slightly more conservative when sample sizes are small
‘hac’ - an estimator robust to both heteroscedasticity and auto-correlation; auto-correlation lag can be controlled with the nlags keyword argument; default is 1
ARMA (auto-regressive moving-average) model (experimental). This model is fit through statsmodels.tsa.arima_model.ARMA, so more information about options can be found there. Any settings can be passed in as kwargs. By default fits a (1,1) model with starting lags of 2. This mode is computationally intensive and can take quite a while if Y has many columns. If Y is a 2d array joblib.Parallel is used for faster fitting by parallelizing fits across columns of Y. Parallelization can be controlled by passing in kwargs. Defaults to multi-threading using 10 separate threads, as threads don’t require large arrays to be duplicated in memory. Defaults are also set to enable memory-mapping for very large arrays if backend=’multiprocessing’ to prevent crashes and hangs. Various levels of progress can be monitored using the ‘disp’ (statsmodels) and ‘verbose’ (joblib) keyword arguments with integer values > 0.
- Parameters:
X (ndarray) – design matrix; assumes intercept is included
Y (ndarray) – dependent variable array; if 2d, a model is fit to each column of Y separately
mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or
'arma' –
stats (str) – one of ‘full’, ‘betas’, ‘tstats’. Useful to speed up calculation if
'full'. (you know you only need some statistics and not others. Defaults to) –
robust_estimator (str,optional) – kind of robust estimator to use if mode = ‘robust’; default ‘hc0’
nlags (int,optional) – auto-correlation lag correction if mode = ‘robust’ and robust_estimator = ‘hac’; default 1
order (tuple,optional) – auto-regressive and moving-average orders for mode = ‘arma’; default (1,1)
kwargs (dict) – additional keyword arguments to statsmodels.tsa.arima_model.ARMA and joblib.Parallel
- Returns:
coefficients se: standard error of coefficients t: t-statistics (coef/sterr) p : p-values df: degrees of freedom res: residuals
- Return type:
b
Examples
Standard OLS
>>> results = regress(X,Y,mode='ols')
Robust OLS with heteroscedasticity (hc0) robust standard errors
>>> results = regress(X,Y,mode='robust')
Robust OLS with heteroscedasticty and auto-correlation (with lag 2) robust standard errors
>>> results = regress(X,Y,mode='robust',robust_estimator='hac',nlags=2)
Auto-regressive mode with auto-regressive and moving-average lags = 1
>>> results = regress(X,Y,mode='arma',order=(1,1))
Auto-regressive model with auto-regressive lag = 2, moving-average lag = 3, and multi-processing instead of multi-threading using 8 cores (this can use a lot of memory if input arrays are very large!).
>>> results = regress(X,Y,mode='arma',order=(2,3),backend='multiprocessing',n_jobs=8)
- nltools.stats.summarize_bootstrap(data, save_weights=False)[source]¶
Calculate summary of bootstrap samples
- Parameters:
sample – (Brain_Data) Brain_Data instance of samples
save_weights – (bool) save bootstrap weights
- Returns:
(dict) dictionary of Brain_Data summary images
- Return type:
output
- nltools.stats.threshold(stat, p, thr=0.05, return_mask=False)[source]¶
Threshold test image by p-value from p image
- Parameters:
stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)
p – (Brain_Data) Brain_data instance of p-values
threshold – (float) p-value to threshold stat image
return_mask – (bool) optionall return the thresholding mask; default False
- Returns:
Thresholded Brain_Data instance
- Return type:
out
- nltools.stats.transform_pairwise(X, y)[source]¶
Transforms data into pairs with balanced labels for ranking Transforms a n-class ranking problem into a two-class classification problem. Subclasses implementing particular strategies for choosing pairs should override this method. In this method, all pairs are choosen, except for those that have the same target value. The output is an array of balanced classes, i.e. there are the same number of -1 as +1
Reference: “Large Margin Rank Boundaries for Ordinal Regression”, R. Herbrich, T. Graepel, K. Obermayer. Authors: Fabian Pedregosa <fabian@fseoane.net> Alexandre Gramfort <alexandre.gramfort@inria.fr>
- Parameters:
X – (np.array), shape (n_samples, n_features) The data
y – (np.array), shape (n_samples,) or (n_samples, 2) Target labels. If it’s a 2D array, the second column represents the grouping of samples, i.e., samples with different groups will not be considered.
- Returns:
- (np.array), shape (k, n_feaures)
Data as pairs, where k = n_samples * (n_samples-1)) / 2 if grouping values were not passed. If grouping variables exist, then returns values computed for each group.
- y_trans: (np.array), shape (k,)
Output class labels, where classes have values {-1, +1} If y was shape (n_samples, 2), then returns (k, 2) with groups on the second dimension.
- Return type:
X_trans
- nltools.stats.trim(data, cutoff=None)[source]¶
Trim a Pandas DataFrame or Series by replacing outlier values with NaNs
- Parameters:
data – (pd.DataFrame, pd.Series) data to trim
cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}
- Returns:
(pd.DataFrame, pd.Series) trimmed data
- Return type:
out
- nltools.stats.two_sample_permutation(data1, data2, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
Independent sample permutation test.
- Parameters:
data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
n_permute – (int) number of permutations
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
- Returns:
(dict) dictionary of permutation results [‘mean’,’p’]
- Return type:
stats
- nltools.stats.u_center(mat)[source]¶
U-center a 2d array. U-centering is a bias-corrected form of double-centering
- Parameters:
mat (ndarray) – 2d numpy array
- Returns:
u-centered version of input
- Return type:
mat (narray)
- nltools.stats.upsample(data, sampling_freq=None, target=None, target_type='samples', method='linear')[source]¶
Upsample pandas to a new target frequency or number of samples using interpolation.
- Parameters:
data – (pd.DataFrame, pd.Series) data to upsample (Note: will drop non-numeric columns from DataFrame)
sampling_freq – Sampling frequency of data in hertz
target – (float) upsampling target
target_type – (str) type of target can be [samples,seconds,hz]
method – (str) [‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’] where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order (default: linear)
- Returns:
upsampled pandas object
- nltools.stats.winsorize(data, cutoff=None, replace_with_cutoff=True)[source]¶
Winsorize a Pandas DataFrame or Series with the largest/lowest value not considered outlier
- Parameters:
data – (pd.DataFrame, pd.Series) data to winsorize
cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}
replace_with_cutoff – (bool) If True, replace outliers with cutoff. If False, replaces outliers with closest existing values; (default: False)
- Returns:
(pd.DataFrame, pd.Series) winsorized data
- Return type:
out
nltools.datasets
: Dataset Tools¶
NeuroLearn datasets¶
functions to help download datasets
- nltools.datasets.download_collection(collection=None, data_dir=None, overwrite=False, resume=True, verbose=1)[source]¶
Download images and metadata from Neurovault collection
- Parameters:
collection (int, optional) – collection id. Defaults to None.
data_dir (str, optional) – data directory. Defaults to None.
overwrite (bool, optional) – overwrite data directory. Defaults to False.
resume (bool, optional) – resume download. Defaults to True.
verbose (int, optional) – print diagnostic messages. Defaults to 1.
- Returns:
(DataFrame of image metadata, list of files from downloaded collection)
- Return type:
(pd.DataFrame, list)
- nltools.datasets.fetch_emotion_ratings(data_dir=None, resume=True, verbose=1)[source]¶
Download and loads emotion rating dataset from neurovault
- Parameters:
data_dir – (string, optional). Path of the data directory. Used to force data storage in a specified location. Default: None
- Returns:
(Brain_Data) Brain_Data object with downloaded data. X=metadata
- Return type:
out
- nltools.datasets.fetch_pain(data_dir=None, resume=True, verbose=1)[source]¶
Download and loads pain dataset from neurovault
- Parameters:
data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location. Default: None
- Returns:
(Brain_Data) Brain_Data object with downloaded data. X=metadata
- Return type:
out
- nltools.datasets.get_collection_image_metadata(collection=None, data_dir=None, limit=10)[source]¶
Get image metadata associated with collection
- Parameters:
collection (int, optional) – collection id. Defaults to None.
data_dir (str, optional) – data directory. Defaults to None.
limit (int, optional) – number of images to increment. Defaults to 10.
- Returns:
Dataframe with full image metadata from collection
- Return type:
pd.DataFrame
nltools.cross_validation
: Cross-Validation Tools¶
Cross-Validation Data Classes¶
Scikit-learn compatible classes for performing various types of cross-validation
- class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]¶
K-Folds cross validation iterator which stratifies continuous data (unlike scikit-learn equivalent).
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k - 1 remaining folds form the training set. Extension of KFold from scikit-learn cross_validation model
- Parameters:
n_splits – int, default=3 Number of folds. Must be at least 2.
shuffle – boolean, optional Whether to shuffle the data before splitting into batches.
random_state – None, int or RandomState Pseudo-random number generator state used for random sampling. If None, use default numpy RNG for shuffling
- split(X, y, groups=None)[source]¶
Generate indices to split data into training and test set.
- Parameters:
X – array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.y – array-like, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.
groups – (object) Always ignored, exists for compatibility.
- Returns:
(ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.
- Return type:
train
- nltools.cross_validation.set_cv(Y=None, cv_dict=None, return_generator=True)[source]¶
Helper function to create a sci-kit learn compatible cv object using common parameters for prediction analyses.
- Parameters:
Y – (pd.DataFrame) Pandas Dataframe of Y labels
cv_dict – (dict) Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout}
return_generator (bool) – return a cv generator instead of an instance; default True
- Returns:
a scikit-learn model-selection generator
- Return type:
cv
- class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]¶
K-Folds cross validation iterator which stratifies continuous data (unlike scikit-learn equivalent).
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k - 1 remaining folds form the training set. Extension of KFold from scikit-learn cross_validation model
- Parameters:
n_splits – int, default=3 Number of folds. Must be at least 2.
shuffle – boolean, optional Whether to shuffle the data before splitting into batches.
random_state – None, int or RandomState Pseudo-random number generator state used for random sampling. If None, use default numpy RNG for shuffling
- split(X, y, groups=None)[source]¶
Generate indices to split data into training and test set.
- Parameters:
X – array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.y – array-like, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.
groups – (object) Always ignored, exists for compatibility.
- Returns:
(ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.
- Return type:
train
nltools.mask
: Mask Tools¶
NeuroLearn Mask Classes¶
Classes to represent masks
- nltools.mask.collapse_mask(mask, auto_label=True, custom_mask=None)[source]¶
- collapse separate masks into one mask with multiple integers
overlapping areas are ignored
- Parameters:
mask – nibabel or Brain_Data instance
custom_mask – nibabel instance or string to file path; optional
- Returns:
- Brain_Data instance of a mask with different integers indicating
different masks
- Return type:
out
- nltools.mask.create_sphere(coordinates, radius=5, mask=None)[source]¶
Generate a set of spheres in the brain mask space
- Parameters:
radius – vector of radius. Will create multiple spheres if len(radius) > 1
centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
- nltools.mask.expand_mask(mask, custom_mask=None)[source]¶
expand a mask with multiple integers into separate binary masks
- Parameters:
mask – nibabel or Brain_Data instance
custom_mask – nibabel instance or string to file path; optional
- Returns:
Brain_Data instance of multiple binary masks
- Return type:
out
- nltools.mask.roi_to_brain(data, mask_x)[source]¶
This function will create convert an expanded binary mask of ROIs (see expand_mask) based on a vector of of values. The dataframe of values must correspond to ROI numbers.
This is useful for populating a parcellation scheme by a vector of Values
- Parameters:
data – Pandas series, dataframe, list, np.array of ROI by observation
mask_x – an expanded binary mask
- Returns:
- (Brain_Data) Brain_Data instance where each ROI is now populated
with a value
- Return type:
out
nltools.file_reader
: File Reading¶
NeuroLearn File Reading Tools¶
- nltools.file_reader.onsets_to_dm(F, sampling_freq, run_length, header='infer', sort=False, keep_separate=True, add_poly=None, unique_cols=None, fill_na=None, **kwargs)[source]¶
This function can assist in reading in one or several in a 2-3 column onsets files, specified in seconds and converting it to a Design Matrix organized as samples X Stimulus Classes. sampling_freq should be specified in hertz; for TRs use hertz = 1/TR. Onsets files must be organized with columns in one of the following 4 formats:
‘Stim, Onset’
‘Onset, Stim’
‘Stim, Onset, Duration’
‘Onset, Duration, Stim’
No other file organizations are currently supported. Note: Stimulus offsets (onset + duration) that fall into an adjacent TR include that full TR. E.g. offset of 10.16s with TR = 2 has an offset of TR 5, which spans 10-12s, rather than an offset of TR 4, which spans 8-10s.
- Parameters:
F (str/Path/pd.DataFrame) – filepath or pandas dataframe
sampling_freq (float) – samping frequency in hertz, i.e 1 / TR
run_length (int) – run length in number of TRs
header (str/None, optional) – whether there’s an additional header row in the
"infer". (supplied file/dataframe. See pd.read_csv for more details. Defaults to) –
sort (bool, optional) – whether to sort dataframe columns alphabetically. Defaults to False.
keep_separate (bool, optional) – if a list of files or dataframes is supplied,
True. (whether to create separate polynomial columns per file. Defaults to) –
add_poly (bool/int, optional) – whether to add Nth order polynomials to design
None. (matrix. Defaults to) –
unique_cols (list/None, optional) – if a list of files or dataframes is supplied,
file (what additional columns to keep separate per) –
fill_na (Any, optional) – what to replace NaNs with. Defaults to None (no filling).
- Returns:
design matrix organized as TRs x Stims
- Return type:
nltools.utils
: Utilities¶
NeuroLearn Utilities¶
handy utilities.
- nltools.utils.get_anatomical()[source]¶
Get nltools default anatomical image. DEPRECATED. See MNI_Template and resolve_mni_path from nltools.prefs
- nltools.utils.set_algorithm(algorithm, *args, **kwargs)[source]¶
Setup the algorithm to use in subsequent prediction analyses.
- Parameters:
algorithm – The prediction algorithm to use. Either a string or an (uninitialized) scikit-learn prediction object. If string, must be one of ‘svm’,’svr’, linear’,’logistic’,’lasso’, ‘lassopcr’,’lassoCV’,’ridge’,’ridgeCV’,’ridgeClassifier’, ‘randomforest’, or ‘randomforestClassifier’
kwargs – Additional keyword arguments to pass onto the scikit-learn clustering object.
- Returns:
dictionary of settings for prediction
- Return type:
predictor_settings
- nltools.utils.set_decomposition_algorithm(algorithm, n_components=None, *args, **kwargs)[source]¶
Setup the algorithm to use in subsequent decomposition analyses.
- Parameters:
algorithm – The decomposition algorithm to use. Either a string or an (uninitialized) scikit-learn decomposition object. If string must be one of ‘pca’,’nnmf’, ica’,’fa’, ‘dictionary’, ‘kernelpca’.
kwargs – Additional keyword arguments to pass onto the scikit-learn clustering object.
- Returns:
dictionary of settings for prediction
- Return type:
predictor_settings
nltools.prefs
: Preferences¶
This module can be used to adjust the default MNI template settings that are used internally by all Brain_Data operations. By default all operations are performed in MNI152 2mm space. Thus any files loaded with be resampled to this space by default.You can control this on a per-file loading basis using the mask argument of Brain_Data, e.g.
from nltools.data import Brain_Data
# my_brain will be resampled to 2mm
brain = Brain_Data('my_brain.nii.gz')
# my_brain will now be resampled to the same space as my_mask
brain = Brain_Data('my_brain.nii.gz', mask='my_mask.nii.gz') # will be resampled
Alternatively this module can be used to switch between 2mm or 3mm MNI spaces with and without ventricles:
from nltools.prefs import MNI_Template, resolve_mni_path
from nltools.data import Brain_Data
# Update the resolution globally
MNI_Template['resolution'] = '3mm'
# This works too:
MNI_Template.resolution = 3
# my_brain will be resampled to 3mm and future operation will be in 3mm space
brain = Brain_Data('my_brain.nii.gz')
# get the template nifti files
resolve_mni_path(MNI_Template)
# will print like:
{
'resolution': '3mm',
'mask_type': 'with_ventricles',
'mask': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain_mask.nii.gz',
'plot': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm.nii.gz',
'brain':
'/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain.nii.gz'
}
nltools.plotting
: Plotting Tools¶
NeuroLearn Plotting Tools¶
Numerous functions to plot data
- nltools.plotting.dist_from_hyperplane_plot(stats_output)[source]¶
Plot SVM Classification Distance from Hyperplane
- Parameters:
stats_output – a pandas file with prediction output
- Returns:
Will return a seaborn plot of distance from hyperplane
- Return type:
fig
- nltools.plotting.plot_between_label_distance(distance, labels, ax=None, permutation_test=True, n_permute=5000, fontsize=18, **kwargs)[source]¶
Create a heatmap indicating average between label distance
- Parameters:
distance – (pandas dataframe) brain_distance matrix
labels – (pandas dataframe) group labels
ax – axis to plot (default=None)
permutation_test – (boolean)
n_permute – (int) number of samples for permuation test
fontsize – (int) size of font for plot
- Returns:
heatmap out: pandas dataframe of pairwise distance between conditions within_dist_out: average pairwise distance matrix mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions
- Return type:
f
- nltools.plotting.plot_brain(objIn, how='full', thr_upper=None, thr_lower=None, save=False, **kwargs)[source]¶
More complete brain plotting of a Brain_Data instance
- Parameters:
obj (Brain_Data) – object to plot
how (str) – whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’
thr_upper (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()
thr_lower (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()
save (str) – if a string file name or path is provided plots will be saved into this directory appended with the orientation they belong to
kwargs – optionals args to nilearn plot functions (e.g. vmax)
- nltools.plotting.plot_interactive_brain(brain, threshold=1e-06, surface=False, percentile_threshold=False, anatomical=None, **kwargs)[source]¶
This function leverages nilearn’s new javascript based brain viewer functions to create interactive plotting functionality.
- Parameters:
brain (nltools.Brain_Data) – a Brain_Data instance of 1d or 2d shape (i.e. 3d or 4d volume)
threshold (float/str) – threshold to initialize the visualization, maybe be a percentile string; default 0
surface (bool) – whether to create a surface-based plot; default False
percentile_threshold (bool) – whether to interpret threshold values as percentiles
kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
- Returns:
interactive brain viewer widget
- nltools.plotting.plot_mean_label_distance(distance, labels, ax=None, permutation_test=False, n_permute=5000, fontsize=18, **kwargs)[source]¶
Create a violin plot indicating within and between label distance.
- Parameters:
distance – pandas dataframe of distance
labels – labels indicating columns and rows to group
ax – matplotlib axis to plot on
permutation_test – (bool) indicates whether to run permuatation test or not
n_permute – (int) number of permutations to run
fontsize – (int) fontsize for plot labels
- Returns:
heatmap stats: (optional if permutation_test=True) permutation results
- Return type:
f
- nltools.plotting.plot_silhouette(distance, labels, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶
Create a silhouette plot indicating between relative to within label distance
- Parameters:
distance – (pandas dataframe) brain_distance matrix
labels – (pandas dataframe) group labels
ax – axis to plot (default=None)
permutation_test – (boolean)
n_permute – (int) number of samples for permuation test
- Optional keyword args:
figsize: (list) dimensions of silhouette plot colors: (list) color triplets for silhouettes. Length must equal number of unique labels
- Returns:
heatmap # out: pandas dataframe of pairwise distance between conditions # within_dist_out: average pairwise distance matrix # mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions # p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions
- Return type:
# f
- nltools.plotting.plot_stacked_adjacency(adjacency1, adjacency2, normalize=True, **kwargs)[source]¶
Create stacked adjacency to illustrate similarity.
- Parameters:
matrix1 – Adjacency instance 1
matrix2 – Adjacency instance 2
normalize – (boolean) Normalize matrices.
- Returns:
matplotlib figure
- nltools.plotting.plot_t_brain(objIn, how='full', thr='unc', alpha=None, nperm=None, cut_coords=[], **kwargs)[source]¶
Takes a brain data object and computes a 1 sample t-test across it’s first axis. If a list is provided will compute difference between brain data objects in list (i.e. paired samples t-test). :param objIn: if list will compute difference map first :type objIn: list/Brain_Data :param how: whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’ :type how: list :param thr: what method to use for multiple comparisons correction unc, fdr, or tfce :type thr: str :param alpha: p-value threshold :type alpha: float :param nperm: number of permutations for tcfe; default 1000 :type nperm: int :param cut_coords: x,y,z coords to plot brain slice :type cut_coords: list :param kwargs: optionals args to nilearn plot functions (e.g. vmax)
- nltools.plotting.probability_plot(stats_output)[source]¶
Plot Classification Probability
- Parameters:
stats_output – a pandas file with prediction output
- Returns:
Will return a seaborn scatterplot
- Return type:
fig
nltools.simulator
: Simulator Tools¶
NeuroLearn Simulator Tools¶
Tools to simulate multivariate data.
- class nltools.simulator.Simulator(brain_mask=None, output_dir=None, random_state=None)[source]¶
- create_cov_data(cor, cov, sigma, mask=None, reps=1, n_sub=1, output_dir=None)[source]¶
create continuous simulated data with covariance
- Parameters:
cor – amount of covariance between each voxel and Y variable
cov – amount of covariance between voxels
sigma – amount of noise to add
radius – vector of radius. Will create multiple spheres if len(radius) > 1
center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
reps – number of data repetitions
n_sub – number of subjects to simulate
output_dir – string path of directory to output data. If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- create_data(levels, sigma, radius=5, center=None, reps=1, output_dir=None)[source]¶
create simulated data with integers
- Parameters:
levels – vector of intensities or class labels
sigma – amount of noise to add
radius – vector of radius. Will create multiple spheres if len(radius) > 1
center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
reps – number of data repetitions useful for trials or subjects
output_dir – string path of directory to output data. If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- create_ncov_data(cor, cov, sigma, masks=None, reps=1, n_sub=1, output_dir=None)[source]¶
create continuous simulated data with covariance
- Parameters:
cor – amount of covariance between each voxel and Y variable (an int or a vector)
cov – amount of covariance between voxels (an int or a matrix)
sigma – amount of noise to add
mask – region(s) where we will have activations (list if more than one)
reps – number of data repetitions
n_sub – number of subjects to simulate
output_dir – string path of directory to output data. If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm
- gaussian(mu, sigma, i_tot)[source]¶
create a 3D gaussian signal normalized to a given intensity
- Parameters:
mu – average value of the gaussian signal (usually set to 0)
sigma – standard deviation
i_tot – sum total of activation (numerical integral over the gaussian returns this value)
- n_spheres(radius, center)[source]¶
generate a set of spheres in the brain mask space
- Parameters:
radius – vector of radius. Will create multiple spheres if len(radius) > 1
centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
- normal_noise(mu, sigma)[source]¶
produce a normal noise distribution for all all points in the brain mask
- Parameters:
mu – average value of the gaussian signal (usually set to 0)
sigma – standard deviation