API Reference¶
This reference provides detailed documentation for all modules, classes, and methods in the current release of Neurolearn.
nltools.data
: Data Types¶

class
nltools.data.
Brain_Data
(data=None, Y=None, X=None, mask=None, output_file=None, **kwargs)[source]¶ Brain_Data is a class to represent neuroimaging data in python as a vector rather than a 3dimensional matrix.This makes it easier to perform data manipulation and analyses.
 Parameters
data – nibabel data instance or list of files
Y – Pandas DataFrame of training labels
X – Pandas DataFrame Design Matrix for running univariate models
mask – binary nifiti file to mask brain data
output_file – Name to write out to nifti file
**kwargs – Additional keyword arguments to pass to the prediction algorithm

align
(target, method='procrustes', axis=0, *args, **kwargs)[source]¶ Align Brain_Data instance to target object using functional alignment
Alignment type can be hyperalignment or Shared Response Model. When using hyperalignment, target image can be another subject or an already estimated common model. When using SRM, target must be a previously estimated common model stored as a numpy array. Transformed data can be back projected to original data using Tranformation matrix.
See nltools.stats.align for aligning multiple Brain_Data instances
Examples
 Hyperalign using procrustes transform:
out = data.align(target, method=’procrustes’)
 Align using shared response model:
out = data.align(target, method=’probabilistic_srm’, n_features=None)
 Project aligned data into original data:
original_data = np.dot(out[‘transformed’].data,out[‘transformation_matrix’].T)
 Parameters
target – (Brain_Data) object to align to.
method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
axis – (int) axis to align on
 Returns
 (dict) a dictionary containing transformed object,
transformation matrix, and the shared response matrix
 Return type
out

append
(data, **kwargs)[source]¶ Append data to Brain_Data instance
 Parameters
data – (Brain_Data) Brain_Data instance to append
kwargs – optional inputs to Design_Matrix append
 Returns
(Brain_Data) new appended Brain_Data instance
 Return type
out

apply_mask
(mask, resample_mask_to_brain=False)[source]¶ Mask Brain_Data instance
Note target data will be resampled into the same space as the mask. If you would like the mask resampled into the Brain_Data space, then set resample_mask_to_brain=True.
 Parameters
mask – (Brain_Data or nifti object) mask to apply to Brain_Data object.
resample_mask_to_brain – (bool) Will resample mask to brain space before applying mask (default=False).
 Returns
(Brain_Data) masked Brain_Data object
 Return type
masked

astype
(dtype)[source]¶ Cast Brain_Data.data as type.
 Parameters
dtype – datatype to convert
 Returns
Brain_Data instance with new datatype
 Return type

bootstrap
(function, n_samples=5000, save_weights=False, n_jobs= 1, random_state=None, *args, **kwargs)[source]¶ Bootstrap a Brain_Data method.
Example Useage: b = dat.bootstrap(‘mean’, n_samples=5000) b = dat.bootstrap(‘predict’, n_samples=5000, algorithm=’ridge’) b = dat.bootstrap(‘predict’, n_samples=5000, save_weights=True)
 Parameters
function – (str) method to apply to data for each bootstrap
n_samples – (int) number of samples to bootstrap with replacement
save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.Returns:
output: summarized studentized bootstrap output

decompose
(algorithm='pca', axis='voxels', n_components=None, *args, **kwargs)[source]¶ Decompose Brain_Data object
 Parameters
algorithm – (str) Algorithm to perform decomposition types=[‘pca’,’ica’,’nnmf’,’fa’,’dictionary’,’kernelpca’]
axis – dimension to decompose [‘voxels’,’images’]
n_components – (int) number of components. If None then retain as many as possible.
 Returns
a dictionary of decomposition parameters
 Return type
output

detrend
(method='linear')[source]¶ Remove linear trend from each voxel
 Parameters
type – (‘linear’,’constant’, optional) type of detrending
 Returns
(Brain_Data) detrended Brain_Data instance
 Return type
out

distance
(metric='euclidean', **kwargs)[source]¶ Calculate distance between images within a Brain_Data() instance.
 Parameters
metric – (str) type of distance metric (can use any scikit learn or sciypy metric)
 Returns
(Adjacency) Outputs a 2D distance matrix.
 Return type
dist

extract_roi
(mask, metric='mean', n_components=None)[source]¶ Extract activity from mask
 Parameters
mask – (nifti) nibabel mask can be binary or numbered for different rois
metric – type of extraction method [‘mean’, ‘median’, ‘pca’], (default=mean) NOTE: Only mean currently works!
n_components – if metric=’pca’, number of components to return (takes any input into sklearn.Decomposition.PCA)
 Returns
mean within each ROI across images
 Return type
out

filter
(sampling_freq=None, high_pass=None, low_pass=None, **kwargs)[source]¶ Apply 5th order butterworth filter to data. Wraps nilearn functionality. Does not default to detrending and standardizing like nilearn implementation, but this can be overridden using kwargs.
 Parameters
sampling_freq – sampling freq in hertz (i.e. 1 / TR)
high_pass – high pass cutoff frequency
low_pass – low pass cutoff frequency
kwargs – other keyword arguments to nilearn.signal.clean
 Returns
Filtered Brain_Data instance
 Return type

find_spikes
(global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶ Function to identify spikes from Time Series Data
 Parameters
global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.
diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.
 Returns
pandas dataframe with spikes as indicator variables

icc
(icc_type='icc2')[source]¶  Calculate intraclass correlation coefficient for data within
Brain_Data class
ICC Formulas are based on: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1: x_ij = mu + beta_j + w_ij icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
 Parameters
icc_type – type of icc to calculate (icc: voxel random effect, icc2: voxel and column random effect, icc3: voxel and column fixed effect)
 Returns
(np.array) intraclass correlation coefficient
 Return type
ICC

iplot
(threshold=0, surface=False, anatomical=None, **kwargs)[source]¶ Create an interactive brain viewer for the current brain data instance.
 Parameters
threshold – (float/str) twosided threshold to initialize the visualization, maybe be a percentile string; default 0
surface – (bool) whether to create a surfacebased plot; default False
anatomical – nifti image or filename to overlay
kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
 Returns
interactive brain viewer widget

mean
(axis=0)[source]¶ Get mean of each voxel or image
 Parameters
axis – (int) across images=0 (default), within images=1
 Returns
(float/np.array/Brain_Data)
 Return type
out

median
(axis=0)[source]¶ Get median of each voxel or image
 Parameters
axis – (int) across images=0 (default), within images=1
 Returns
(float/np.array/Brain_Data)
 Return type
out

multivariate_similarity
(images, method='ols')[source]¶ Predict spatial distribution of Brain_Data() instance from linear combination of other Brain_Data() instances or Nibabel images
 Parameters
self – Brain_Data instance of data to be applied
images – Brain_Data instance of weight map
 Returns
 dictionary of regression statistics in Brain_Data
instances {‘beta’,’t’,’p’,’df’,’residual’}
 Return type
out

plot
(limit=5, anatomical=None, view='axial', colorbar=False, black_bg=True, draw_cross=False, threshold_upper=None, threshold_lower=None, figsize=(15, 2), axes=None, **kwargs)[source]¶ Create a quick plot of self.data. Will plot each image separately
 Parameters
limit – (int) max number of images to return
anatomical – (nifti, str) nifti image or file name to overlay
view – (str) ‘axial’ for limit number of axial slices; ‘glass’ for orthoview glass brain; ‘mni’ for multislice view mni brain; ‘full’ for both glass and mni views
threshold_upper – (str/float) threshold if view is ‘glass’, ‘mni’, or ‘full’
threshold_lower – (str/float)threshold if view is ‘glass’, ‘mni’, or ‘full’
save – (str/bool): optional string file name or path for saving; only applies if view is ‘mni’, ‘glass’, or ‘full’. Filenames will appended with the orientation they belong to

predict
(algorithm=None, cv_dict=None, plot=True, **kwargs)[source]¶ Run prediction
 Parameters
algorithm – Algorithm to use for prediction. Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’
cv_dict – Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y
plot – Boolean indicating whether or not to create plots.
**kwargs – Additional keyword arguments to pass to the prediction algorithm
 Returns
a dictionary of prediction parameters
 Return type
output

predict_multi
(algorithm=None, cv_dict=None, method='searchlight', rois=None, process_mask=None, radius=2.0, scoring=None, n_jobs=1, verbose=0, **kwargs)[source]¶ Perform multiregion prediction. This can be a searchlight analysis or multiroi analysis if provided a Brain_Data instance with labeled nonoverlapping rois.
 Parameters
algorithm (string) – algorithm to use for prediction Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’
cv_dict – Type of cross_validation to use. Default is 3fold. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y
method (string) – one of ‘searchlight’ or ‘roi’
rois (string/nltools.Brain_Data) – nifti file path or Brain_data instance containing nonoverlapping regionsofinterest labeled by integers
process_mask (nib.Nifti1Image/nltools.Brain_Data) – mask to constrain where to perform analyses; only applied if method = ‘searchlight’
radius (float) – radius of searchlight in mm; default 2mm
scoring (function) – callable scoring function; see sklearn documentation; defaults to estimator’s default scoring function
n_jobs (int) – The number of CPUs to use to do permutation; default 1 because this can be very memory intensive
verbose (int) – whether parallelization progress should be printed; default 0
 Returns
image of results
 Return type
output

randomise
(n_permute=5000, threshold_dict=None, return_mask=False, **kwargs)[source]¶ Run massunivariate regression at each voxel with inference performed via permutation testing ala randomise in FSL. Operates just like .regress(), but intended to be used for secondlevel analyses.
 Parameters
n_permute (int) – number of permutations
threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}
return_mask – (bool) optionally return the thresholding mask
 Returns
dictionary of maps for betas, tstats, and pvalues
 Return type
out

regions
(min_region_size=1350, extract_type='local_regions', smoothing_fwhm=6, is_mask=False)[source]¶ Extract brain connected regions into separate regions.
 Parameters
min_region_size (int) – Minimum volume in mm3 for a region to be kept.
extract_type (str) – Type of extraction method [‘connected_components’, ‘local_regions’]. If ‘connected_components’, each component/region in the image is extracted automatically by labelling each region based upon the presence of unique features in their respective regions. If ‘local_regions’, each component/region is extracted based on their maximum peak value to define a seed marker and then using random walker segementation algorithm on these markers for region separation.
smoothing_fwhm (scalar) – Smooth an image to extract more sparser regions. Only works for extract_type ‘local_regions’.
is_mask (bool) – Whether the Brain_Data instance should be treated as a boolean mask and if so, calls connected_label_regions instead.
 Returns
Brain_Data instance with extracted ROIs as data.
 Return type

regress
(mode='ols', **kwargs)[source]¶ Run a massunivariate regression across voxels. Three types of regressions can be run: 1) Standard OLS (default) 2) Robust OLS (heteroscedasticty and/or autocorrelation robust errors), i.e. OLS with “sandwich estimators” 3) ARMA (autoregressive and movingaverage lags = 1 by default; experimental)
For more information see the help for nltools.stats.regress
ARMA notes: This experimental mode is similar to AFNI’s 3dREMLFit but without spatial smoothing of voxel autocorrelation estimates. It can be very computationally intensive so parallelization is used by default to try to speed things up. Speed is limited because a unique ARMA model is fit to each voxel (like AFNI/FSL), but unlike SPM, which assumes the same AR parameters (~0.2) at each voxel. While coefficient results are typically very similar to OLS, stderrors and so tstats, dfs and and pvals can differ greatly depending on how much autocorrelation is explaining the response in a voxel relative to other regressors in the design matrix.
 Parameters
mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’
kwargs (dict) – keyword arguments to nltools.stats.regress
 Returns
 dictionary of regression statistics in Brain_Data instances
{‘beta’,’t’,’p’,’df’,’residual’}
 Return type
out

scale
(scale_val=100.0)[source]¶  Scale all values such that they are on the range [0, scale_val],
via grandmean scaling. This is NOT globalscaling/intensity normalization. This is useful for ensuring that data is on a common scale (e.g. good for multiple runs, participants, etc) and if the default value of 100 is used, can be interpreted as something akin to (but not exactly) “percent signal change.” This is consistent with default behavior in AFNI and SPM. Change this value to 10000 to make consistent with FSL.
 Parameters
scale_val – (int/float) what value to send the grandmean to; default 100

similarity
(image, method='correlation')[source]¶ Calculate similarity of Brain_Data() instance with single Brain_Data or Nibabel image
 Parameters
image – (Brain_Data, nifti) image to evaluate similarity
method – (str) Type of similarity [‘correlation’,’dot_product’,’cosine’]
 Returns
(list) Outputs a vector of pattern expression values
 Return type
pexp

smooth
(fwhm)[source]¶ Apply spatial smoothing using nilearn smooth_img()
 Parameters
fwhm – (float) full width half maximum of gaussian spatial filter
 Returns
Brain_Data instance

standardize
(axis=0, method='center')[source]¶ Standardize Brain_Data() instance.
 Parameters
axis – 0 for observations 1 for voxels
method – [‘center’,’zscore’]
 Returns
Brain_Data Instance

std
(axis=0)[source]¶ Get standard deviation of each voxel or image.
 Parameters
axis – (int) across images=0 (default), within images=1
 Returns
(float/np.array/Brain_Data)
 Return type
out

temporal_resample
(sampling_freq=None, target=None, target_type='hz')[source]¶ Resample Brain_Data timeseries to a new target frequency or number of samples using Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation. This function can up or downsample data.
Note: this function can use quite a bit of RAM.
 Args:
sampling_freq: (float) sampling frequency of data in hertz target: (float) upsampling target target_type: (str) type of target can be [samples,seconds,hz]
 Returns:
upsampled Brain_Data instance

threshold
(upper=None, lower=None, binarize=False, coerce_nan=True)[source]¶  Threshold Brain_Data instance. Provide upper and lower values or
percentages to perform twosided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every nonzero value.
 Parameters
upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every nonzero value; default False
coerce_nan (bool) – coerce nan values to 0s; default True
 Returns
Thresholded Brain_Data object.

transform_pairwise
()[source]¶ Extract brain connected regions into separate regions.
Args:
 Returns
Brain_Data instance tranformed into pairwise comparisons
 Return type

ttest
(threshold_dict=None, return_mask=False)[source]¶ Calculate one sample ttest across each voxel (twosided)
 Parameters
threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}
return_mask – (bool) if thresholding is requested, optionall return the mask of voxels that exceed threshold, e.g. for use with another map
 Returns
 (dict) dictionary of regression statistics in Brain_Data
instances {‘t’,’p’}
 Return type
out

upload_neurovault
(access_token=None, collection_name=None, collection_id=None, img_type=None, img_modality=None, **kwargs)[source]¶  Upload Data to Neurovault. Will add any columns in self.X to image
metadata. Index will be used as image name.
 Parameters
access_token – (str, Required) Neurovault api access token
collection_name – (str, Optional) name of new collection to create
collection_id – (int, Optional) neurovault collection_id if adding images to existing collection
img_type – (str, Required) Neurovault map_type
img_modality – (str, Required) Neurovault image modality
 Returns
(pd.DataFrame) neurovault collection information
 Return type
collection

class
nltools.data.
Adjacency
(data=None, Y=None, matrix_type=None, labels=[], **kwargs)[source]¶ Adjacency is a class to represent Adjacency matrices as a vector rather than a 2dimensional matrix. This makes it easier to perform data manipulation and analyses.
 Parameters
data – pandas data instance or list of files
matrix_type – (str) type of matrix. Possible values include: [‘distance’,’similarity’,’directed’,’distance_flat’, ‘similarity_flat’,’directed_flat’]
Y – Pandas DataFrame of training labels
**kwargs – Additional keyword arguments

append
(data)[source]¶ Append data to Adjacency instance
 Parameters
data – (Adjacency) Adjacency instance to append
 Returns
(Adjacency) new appended Adjacency instance
 Return type
out

bootstrap
(function, n_samples=5000, save_weights=False, n_jobs= 1, random_state=None, *args, **kwargs)[source]¶ Bootstrap an Adjacency method.
Example Useage: b = dat.bootstrap(‘mean’, n_samples=5000) b = dat.bootstrap(‘predict’, n_samples=5000, algorithm=’ridge’) b = dat.bootstrap(‘predict’, n_samples=5000, save_weights=True)
 Parameters
function – (str) method to apply to data for each bootstrap
n_samples – (int) number of samples to bootstrap with replacement
save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.Returns:
output: summarized studentized bootstrap output

cluster_summary
(clusters=None, metric='mean', summary='within')[source]¶ This function provides summaries of clusters within Adjacency matrices.
It can compute mean/median of within and between cluster values. Requires a list of cluster ids indicating the row/column of each cluster.
 Parameters
clusters – (list) list of cluster labels
metric – (str) method to summarize mean or median. If ‘None” then return all r values
summary – (str) summarize within cluster or between clusters
 Returns
(dict) within cluster means
 Return type
dict

distance
(metric='correlation', **kwargs)[source]¶ Calculate distance between images within an Adjacency() instance.
 Parameters
metric – (str) type of distance metric (can use any scikit learn or sciypy metric)
 Returns
(Adjacency) Outputs a 2D distance matrix.
 Return type
dist

distance_to_similarity
(metric='correlation', beta=1)[source]¶ Convert distance matrix to similarity matrix.
Note: currently only implemented for correlation and euclidean.
 Parameters
metric – (str) Can only be correlation or euclidean
beta – (float) parameter to scale exponential function (default: 1) for euclidean
 Returns
(Adjacency) Adjacency object
 Return type
out

generate_permutations
(n_perm, random_state=None)[source]¶ Generate n_perm permutated versions of Adjacency in a lazy fashion. Useful for iterating against.
 Parameters
n_perm (int) – number of permutations
random_state (int, np.random.seed, optional) – random seed for reproducibility. Defaults to None.
Examples
>>> for perm in adj.generate_permutations(1000): >>> out = neural_distance_mat.similarity(perm) >>> ...
 Yields
Adjacency – permuted version of self

isc
(n_bootstraps=5000, metric='median', ci_percentile=95, exclude_self_corr=True, return_bootstraps=False, tail=2, n_jobs= 1, random_state=None)[source]¶ Compute intersubject correlation.
This implementation uses the subjectwise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). As recommended by Chen et al., 2016, we compute the median pairwise ISC by default. However, if the mean is preferred, we compute the mean correlation after performing the fisher rtoz transformation and then convert back to correlations to minimize artificially inflating the correlation values. We compute the pvalues using the percentile method using the same method in Brainiak.
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to intersubject correlation analysis at the group level. NeuroImage, 142, 248259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757762.
 Parameters
n_bootstraps – (int) number of bootstraps
metric – (str) type of association metric [‘spearman’,’pearson’,’kendall’]
tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the pvalue; default False
 Returns
(dict) dictionary of permutation results [‘correlation’,’p’]
 Return type
stats

mean
(axis=0)[source]¶ Calculate mean of Adjacency
 Parameters
axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.
 Returns
 float if single, adjacency if axis=0, np.array if axis=1
and multiple
 Return type
mean

median
(axis=0)[source]¶ Calculate median of Adjacency
 Parameters
axis – (int) calculate median over features (0) or data (1). For data it will be on upper triangle.
 Returns
 float if single, adjacency if axis=0, np.array if axis=1
and multiple
 Return type
mean

plot
(limit=3, axes=None, *args, **kwargs)[source]¶ Create Heatmap of Adjacency Matrix
Can pass in any sns.heatmap argument
 Parameters
limit – (int) number of heatmaps to plot if object contains multiple adjacencies (default: 3)
axes – matplotlib axis handle

plot_label_distance
(labels=None, ax=None)[source]¶ Create a violin plot indicating within and between label distance
 Parameters
labels (np.array) – numpy array of labels to plot
 Returns
violin plot handles
 Return type
f

plot_mds
(n_components=2, metric=True, labels=None, labels_color=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_jobs=1, view=(30, 20), figsize=[12, 8], ax=None, *args, **kwargs)[source]¶ Plot Multidimensional Scaling
 Parameters
n_components – (int) Number of dimensions to project (can be 2 or 3)
metric – (bool) Perform metric or nonmetric dimensional scaling; default
labels – (list) Can override labels stored in Adjacency Class
labels_color – (str) list of colors for labels, if len(1) then make all same color
n_jobs – (int) Number of parallel jobs
view – (tuple) view for 3Dimensional plot; default (30,20)

plot_silhouette
(labels=None, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶ Create a silhouette plot

regress
(X, mode='ols', **kwargs)[source]¶ Run a regression on an adjacency instance. You can decompose an adjacency instance with another adjacency instance. You can also decompose each pixel by passing a design_matrix instance.
 Parameters
X – Design matrix can be an Adjacency or Design_Matrix instance
method – type of regression (default: ols)
 Returns
(dict) dictionary of stats outputs.
 Return type
stats

similarity
(data, plot=False, perm_type='2d', n_permute=5000, metric='spearman', ignore_diagonal=False, **kwargs)[source]¶ Calculate similarity between two Adjacency matrices. Default is to use spearman correlation and permutation test. :param data: Adjacency data, or 1d array same size as self.data :param perm_type: (str) ‘1d’,’2d’, or None :param metric: (str) ‘spearman’,’pearson’,’kendall’ :param ignore_diagonal: (bool) only applies to ‘directed’ Adjacency types using perm_type=None or perm_type=’1d’
Estimate the social relations model from a matrix for a roundrobin design.
X_{ij} = m + lpha_i + eta_j + g_{ij} + epsilon_{ijl}
where X_{ij} is the score for person i rating person j, m is the group mean, lpha_i is person i’s actor effect, eta_j is person j’s partner effect, g_{ij} is the relationship effect and epsilon_{ijl} is the error in measure l for actor i and partner j.
This model is primarily concerned with partioning the variance of the various effects.
Code is based on implementation presented in Chapter 8 of Kenny, Kashy, & Cook (2006). Tests replicate examples presented in the book. Note, that this method assumes that actor scores are rows (lower triangle), while partner scores are columnns (upper triangle). The minimal sample size to estimate these effects is 4.
 Model Assumptions:
Social interactions are exclusively dyadic
People are randomly sampled from population
No order effects
The effects combine additively and relationships are linear
In the future we might update the formulas and standard errors based on Bond and Lashley, 1996
 Parameters
self – (adjacency) can be a single matrix or many matrices for each group
summarize_results – (bool) will provide a formatted summary of model results
nan_replace – (bool) will replace nan values with row and column means
 Returns
(pd.Series/pd.DataFrame) All of the effects estimated using SRM
 Return type
estimated effects

stats_label_distance
(labels=None, n_permute=5000, n_jobs= 1)[source]¶ Calculate permutation tests on within and between label distance.
 Parameters
labels (np.array) – numpy array of labels to plot
n_permute (int) – number of permutations to run (default=5000)
 Returns
 dictionary of within and between group differences
and pvalues
 Return type
dict

std
(axis=0)[source]¶ Calculate standard deviation of Adjacency
 Parameters
axis – (int) calculate std over features (0) or data (1). For data it will be on upper triangle.
 Returns
 float if single, adjacency if axis=0, np.array if axis=1 and
multiple
 Return type
std

sum
(axis=0)[source]¶ Calculate sum of Adjacency
 Parameters
axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.
 Returns
 float if single, adjacency if axis=0, np.array if axis=1
and multiple
 Return type
mean

threshold
(upper=None, lower=None, binarize=False)[source]¶  Threshold Adjacency instance. Provide upper and lower values or
percentages to perform twosided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every nonzero value.
 Parameters
upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every nonzero value; default False
 Returns
thresholded Adjacency instance
 Return type

ttest
(permutation=False, **kwargs)[source]¶ Calculate ttest across samples.
 Parameters
permutation – (bool) Run ttest as permutation. Note this can be very slow.
 Returns
 (dict) contains Adjacency instances of t values (or mean if
running permutation) and Adjacency instance of p values.
 Return type
out

class
nltools.data.
Design_Matrix
(*args, **kwargs)[source]¶ Design_Matrix is a class to represent design matrices with special methods for data processing (e.g. convolution, upsampling, downsampling) and also intelligent and flexible and intelligent appending (e.g. automatically keep certain columns or polynomial terms separated during concatentation). It plays nicely with Brain_Data and can be used to build an experimental design to pass to Brain_Data’s X attribute. It is essentially an enhanced pandas df, with extra attributes and methods. Methods always return a new design matrix instance (copy). Column names are always string types.
 Parameters
sampling_freq (float) – sampling rate of each row in hertz; To covert seconds to hertz (e.g. in the case of TRs for neuroimaging) using hertz = 1 / TR
convolved (list, optional) – on what columns convolution has been performed; defaults to None
polys (list, optional) – list of polynomial terms in design matrix, e.g. intercept, polynomial trends, basis functions, etc; default None

add_dct_basis
(duration=180, drop=0)[source]¶ Adds unit scaled cosine basis functions to Design_Matrix columns, based on spmstyle discrete cosine transform for use in highpass filtering. Does not add intercept/constant. Care is recommended if using this along with .add_poly(), as some columns will be highlycorrelated.
 Parameters
duration (int) – length of filter in seconds
drop (int) – index of which early/slow bases to drop if any; will always drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2); default None

add_poly
(order=0, include_lower=True)[source]¶ Add nth order Legendre polynomial terms as columns to design matrix. Good for adding constant/intercept to model (order = 0) and accounting for slowfrequency nuisance artifacts e.g. linear, quadratic, etc drifts. Care is recommended when using this with .add_dct_basis() as some columns will be highly correlated.
 Parameters
order (int) – what order terms to add; 0 = constant/intercept (default), 1 = linear, 2 = quadratic, etc
include_lower – (bool) whether to add lower order terms if order > 0

append
(dm, axis=0, keep_separate=True, unique_cols=None, fill_na=0, verbose=False)[source]¶ Method for concatenating another design matrix row or columnwise. When concatenating rowwise, has the ability to keep certain columns separated if they exist in multiple design matrices (e.g. keeping separate intercepts for multiple runs). This is on by default and will automatically separate out polynomial columns (i.e. anything added with the add_poly or add_dct_basis methods). Additional columns can be separate by run using the unique_cols parameter. Can also add new polynomial terms during vertical concatentation (when axis == 0). This will by default create new polynomial terms separately for each design matrix
 Parameters
dm (Design_Matrix or list) – design_matrix or list of design_matrices to append
axis (int) – 0 for rowwise (vertcat), 1 for columnwise (horzcat); default 0
keep_separate (bool,optional) – whether try and uniquify columns; defaults to True; only applies when axis==0
unique_cols (list,optional) – what additional columns to try to keep separated by uniquifying, only applies when axis = 0; defaults to None
fill_na (str/int/float) – if provided will fill NaNs with this value during rowwise appending (when axis = 0) if separate columns are desired; default 0
verbose (bool) – print messages during append about how polynomials are going to be separated

clean
(fill_na=0, exclude_polys=False, thresh=0.95, verbose=True)[source]¶ Method to fill NaNs in Design Matrix and remove duplicate columns based on data values, NOT names. Columns are dropped if they are correlated >= the requested threshold (default = .95). In this case, only the first instance of that column will be retained and all others will be dropped.
 Parameters
fill_na (str/int/float) – value to fill NaNs with set to None to retain NaNs; default 0
exclude_polys (bool) – whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default False
thresh (float) – correlation threshold to use to drop redundant columns; default .95
verbose (bool) – print what column names were dropped; default True

convolve
(conv_func='hrf', columns=None)[source]¶ Perform convolution using an arbitrary function.
 Parameters
conv_func (ndarray or string) – either a 1d numpy array containing output of a function that you want to convolve; a samples by kernel 2d array of several kernels to convolve; or the string ‘hrf’ which defaults to a glover HRF function at the Design_matrix’s sampling_freq
columns (list) – what columns to perform convolution on; defaults to all nonpolynomial columns

downsample
(target, **kwargs)[source]¶  Downsample columns of design matrix. Relies on
nltools.stats.downsample, but ensures that returned object is a design matrix.
 Parameters
target (float) – desired frequency in hz
kwargs – additional inputs to nltools.stats.downsample

heatmap
(figsize=(8, 6), **kwargs)[source]¶ Visualize Design Matrix spm style. Use .plot() for typical pandas plotting functionality. Can pass optional keyword args to seaborn heatmap.

replace_data
(data, column_names=None)[source]¶ Convenient method to replace all data in Design_Matrix with new data while keeping attributes and polynomial columns untouched.
 Parameters
columns_names (list) – list of columns names for new data

upsample
(target, **kwargs)[source]¶  Upsample columns of design matrix. Relies on
nltools.stats.upsample, but ensures that returned object is a design matrix.
 Parameters
target (float) – desired frequence in hz
kwargs – additional inputs to nltools.stats.downsample

vif
(exclude_polys=True)[source]¶ Compute variance inflation factor amongst columns of design matrix,ignoring polynomial terms. Much faster that statsmodels and more reliable too. Uses the same method as Matlab and R (diagonal elements of the inverted correlation matrix).
 Returns
list with length == number of columns  intercept exclude_polys (bool): whether to skip checking of polynomial terms (i.e intercept, trends, basis functions); default True
 Return type
vifs (list)
nltools.analysis
: Analysis Tools¶

class
nltools.analysis.
Roc
(input_values=None, binary_outcome=None, threshold_type='optimal_overall', forced_choice=None, **kwargs)[source]¶ Roc Class
The Roc class is based on Tor Wager’s Matlab roc_plot.m function and allows a user to easily run different types of receiver operator characteristic curves. For example, one might be interested in single interval or forced choice.
 Parameters
input_values – nibabel data instance
binary_outcome – vector of training labels
threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]
**kwargs – Additional keyword arguments to pass to the prediction algorithm

calculate
(input_values=None, binary_outcome=None, criterion_values=None, threshold_type='optimal_overall', forced_choice=None, balanced_acc=False)[source]¶ Calculate Receiver Operating Characteristic plot (ROC) for singleinterval classification.
 Parameters
input_values – nibabel data instance
binary_outcome – vector of training labels
criterion_values – (optional) criterion values for calculating fpr & tpr
threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]
forced_choice – index indicating position for each unique subject (default=None)
balanced_acc – balanced accuracy for singleinterval classification (bool). THIS IS NOT COMPLETELY IMPLEMENTED BECAUSE IT AFFECTS ACCURACY ESTIMATES, BUT NOT PVALUES OR THRESHOLD AT WHICH TO EVALUATE SENS/SPEC
**kwargs – Additional keyword arguments to pass to the prediction algorithm

plot
(plot_method='gaussian', balanced_acc=False, **kwargs)[source]¶ Create ROC Plot
Create a specific kind of ROC curve plot, based on input values along a continuous distribution and a binary outcome variable (logical)
 Parameters
plot_method – type of plot [‘gaussian’,’observed’]
binary_outcome – vector of training labels
**kwargs – Additional keyword arguments to pass to the prediction algorithm
 Returns
fig
nltools.stats
: Stats Tools¶
NeuroLearn Statistics Tools¶
Tools to help with statistical analyses.

nltools.stats.
align
(data, method='deterministic_srm', n_features=None, axis=0, *args, **kwargs)[source]¶ Align subject data into a common response model.
Can be used to hyperalign source data to target data using Hyperalignment from Dartmouth (i.e., procrustes transformation; see nltools.stats.procrustes) or Shared Response Model from Princeton (see nltools.external.srm). (see nltools.data.Brain_Data.align for aligning a single Brain object to another). Common Model is shared response model or centered target data. Transformed data can be back projected to original data using Tranformation matrix. Inputs must be a list of Brain_Data instances or numpy arrays (observations by features).
Examples
 Hyperalign using procrustes transform:
out = align(data, method=’procrustes’)
 Align using shared response model:
out = align(data, method=’probabilistic_srm’, n_features=None)
 Project aligned data into original data:
original_data = [np.dot(t.data,tm.T) for t,tm in zip(out[‘transformed’], out[‘transformation_matrix’])]
 Parameters
data – (list) A list of Brain_Data objects
method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
n_features – (int) number of features to align to common space. If None then will select number of voxels
axis – (int) axis to align on
 Returns
 (dict) a dictionary containing a list of transformed subject
matrices, a list of transformation matrices, the shared response matrix, and the intersubject correlation of the shared resposnes
 Return type
out

nltools.stats.
align_states
(reference, target, metric='correlation', return_index=False, replace_zero_variance=False)[source]¶ Align state weight maps using hungarian algorithm by minimizing pairwise distance between group states.
 Parameters
reference – (np.array) reference pattern x state matrix
target – (np.array) target pattern x state matrix to align to reference
metric – (str) distance metric to use
return_index – (bool) return index if True, return remapped data if False
replace_zero_variance – (bool) transform a vector with zero variance to random numbers from a uniform distribution. Useful for when using correlation as a distance metric to avoid NaNs.
 Returns
(list) a list of reordered state X pattern matrices
 Return type
ordered_weights

nltools.stats.
calc_bpm
(beat_interval, sampling_freq)[source]¶ Calculate instantaneous BPM from beat to beat interval
 Parameters
beat_interval – (int) number of samples in between each beat (typically RR Interval)
sampling_freq – (float) sampling frequency in Hz
 Returns
(float) beats per minute for time interval
 Return type
bpm

nltools.stats.
correlation
(data1, data2, metric='pearson')[source]¶ This function calculates the correlation between data1 and data2
 Parameters
data1 – (np.array) x
data2 – (np.array) y
metric – (str) type of correlation [“spearman” or “pearson” or “kendall”]
 Returns
(np.array) correlations p: (float) pvalue
 Return type
r

nltools.stats.
correlation_permutation
(data1, data2, method='permute', n_permute=5000, metric='spearman', tail=2, n_jobs= 1, return_perms=False, random_state=None)[source]¶ Compute correlation and calculate pvalue using permutation methods.
‘permute’ method randomly shuffles one of the vectors. This method is recommended for independent data. For timeseries data we recommend using ‘circle_shift’ or ‘phase_randomize’ methods.
 Parameters
data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
n_permute – (int) number of permutations
metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
method – (str) type of permutation [‘permute’, ‘circle_shift’, ‘phase_randomize’]
random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)
tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the pvalue; default False
 Returns
(dict) dictionary of permutation results [‘correlation’,’p’]
 Return type
stats

nltools.stats.
distance_correlation
(x, y, bias_corrected=True, ttest=False)[source]¶ Compute the distance correlation betwen 2 arrays to test for multivariate dependence (linear or nonlinear). Arrays must match on their first dimension. It’s almost always preferable to compute the bias_corrected version which can also optionally perform a ttest. This ttest operates on a statistic thats ~dcorr^2 and will be also returned.
Explanation: Distance correlation involves computing the normalized covariance of two centered euclidean distance matrices. Each distance matrix is the euclidean distance between rows (if x or y are 2d) or scalars (if x or y are 1d). Each matrix is centered prior to computing the covariance either using doublecentering or ucentering, which corrects for bias as the number of dimensions increases. Ucentering is almost always preferred in all cases. It also permits inference of the normalized covariance between each distance matrix using a onetailed directional ttest. (Szekely & Rizzo, 2013). While distance correlation is normally bounded between 0 and 1, ucentering can produce negative estimates, which are never significant.
Validated against the dcor and dcor.ttest functions in the ‘energy’ R package and the dcor.distance_correlation, dcor.udistance_correlation_sqr, and dcor.independence.distance_correlation_t_test functions in the dcor Python package.
 Parameters
x (ndarray) – 1d or 2d numpy array of observations by features
y (ndarry) – 1d or 2d numpy array of observations by features
bias_corrected (bool) – if false use doublecentering which produces a biasedestimate that converges to 1 as the number of dimensions increase. Otherwise used ucentering to correct this bias. Note this must be True if ttest=True; default True
ttest (bool) – perform a ttest using the bias_corrected distance correlation; default False
 Returns
dictionary of results (correlation, t, p, and df.) Optionally, covariance, x variance, and y variance
 Return type
results (dict)

nltools.stats.
double_center
(mat)[source]¶ Double center a 2d array.
 Parameters
mat (ndarray) – 2d numpy array
 Returns
doublecentered version of input
 Return type
mat (ndarray)

nltools.stats.
downsample
(data, sampling_freq=None, target=None, target_type='samples', method='mean')[source]¶ Downsample pandas to a new target frequency or number of samples using averaging.
 Parameters
data – (pd.DataFrame, pd.Series) data to downsample
sampling_freq – (float) Sampling frequency of data in hertz
target – (float) downsampling target
target_type – type of target can be [samples,seconds,hz]
method – (str) type of downsample method [‘mean’,’median’], default: mean
 Returns
(pd.DataFrame, pd.Series) downsmapled data
 Return type
out

nltools.stats.
fdr
(p, q=0.05)[source]¶ Determine FDR threshold given a p value array and desired false discovery rate q. Written by Tal Yarkoni
 Parameters
p – (np.array) vector of pvalues
q – (float) false discovery rate level
 Returns
 (float) pvalue threshold based on independence or positive
dependence
 Return type
fdr_p

nltools.stats.
find_spikes
(data, global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶ Function to identify spikes from fMRI Time Series Data
 Parameters
data – Brain_Data or nibabel instance
global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.
diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.
 Returns
pandas dataframe with spikes as indicator variables

nltools.stats.
holm_bonf
(p, alpha=0.05)[source]¶ Compute corrected pvalues based on the HolmBonferroni method, i.e. stepdown procedure applying iteratively less correction to highest pvalues. A bit more conservative than fdr, but much more powerful thanvanilla bonferroni.
 Parameters
p – (np.array) vector of pvalues
alpha – (float) alpha level
 Returns
 (float) pvalue threshold based on bonferroni
stepdown procedure
 Return type
bonf_p

nltools.stats.
isc
(data, n_bootstraps=5000, metric='median', method='bootstrap', ci_percentile=95, exclude_self_corr=True, return_bootstraps=False, tail=2, n_jobs= 1, random_state=None)[source]¶ Compute pairwise intersubject correlation from observations by subjects array.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher rtoz transformation and then convert back to correlations to minimize artificially inflating the correlation values.
There are currently three different methods to compute pvalues. These include the classic methods for computing permuted timeseries by either circleshifting the data or phaserandomizing the data (see Lancaster et al., 2018). These methods create random surrogate data while preserving the temporal autocorrelation inherent to the signal. By default, we use the subjectwise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the pvalues using the percentile method using the same method in Brainiak.
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to intersubject correlation analysis at the group level. NeuroImage, 142, 248259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757762.
Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018). Surrogate data for hypothesis testing of physical systems. Physics Reports, 748, 160.
 Parameters
data – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
n_bootstraps – (int) number of bootstraps
metric – (str) type of association metric [‘spearman’,’pearson’,’kendall’]
method – (str) method to compute pvalues [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)
tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the pvalue; default False
 Returns
(dict) dictionary of permutation results [‘correlation’,’p’]
 Return type
stats

nltools.stats.
isfc
(data, method='average')[source]¶ Compute intersubject functional connectivity (ISFC) from a list of observation x feature matrices
This function uses the leave one out approach to compute ISFC (Simony et al., 2016). For each subject, compute the crosscorrelation between each voxel/roi with the average of the rest of the subjects data. In other words, compute the mean voxel/ROI response for all participants except the target subject. Then compute the correlation between each ROI within the target subject with the mean ROI response in the group average.
Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature communications, 7, 12141.
 Parameters
data – list of subject matrices (observations x voxels/rois)
method – approach to computing ISFC. ‘average’ uses leave one
 Returns
list of subject ISFC matrices

nltools.stats.
isps
(data, sampling_freq=0.5, low_cut=0.04, high_cut=0.07, order=5, pairwise=False)[source]¶ Compute Dynamic Intersubject Phase Synchrony (ISPS from a observation by subject array)
This function computes the instantaneous intersubject phase synchrony for a single voxel/roi timeseries. Requires multiple subjects. This method is largely based on that described by Glerean et al., 2012 and performs a hilbert transform on narrow bandpass filtered timeseries (butterworth) data to get the instantaneous phase angle. The function returns a dictionary containing the average phase angle, the average vector length, and parametric pvalues computed using the rayleigh test using circular statistics (Fisher, 1993). If pairwise=True, then it will compute these on the pairwise phase angle differences, if pairwise=False, it will compute these on the actual phase angles. This is called intersite phase coupling or intertrial phase coupling respectively in the EEG literatures.
This function requires narrow band filtering your data. As a default we use the recommendations by (Glerean et al., 2012) of .04.07Hz. This is similar to the “slow4” band (0.025–0.067 Hz) described by (Zuo et al., 2010; Penttonen & Buzsáki, 2003), but excludes the .03 band, which has been demonstrated to contain aliased respiration signals (Birn, 2006).
Birn RM, Smith MA, Bandettini PA, Diamond JB. 2006. Separating respiratoryvariationrelated fluctuations from neuronalactivity related fluctuations in fMRI. Neuroimage 31:1536–1548.
Buzsáki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science, 304(5679), 19261929.
Fisher, N. I. (1995). Statistical analysis of circular data. cambridge university press.
Glerean, E., Salmi, J., Lahnakoski, J. M., Jääskeläinen, I. P., & Sams, M. (2012). Functional magnetic resonance imaging phase synchronization as a measure of dynamic functional connectivity. Brain connectivity, 2(2), 91101.
 Parameters
data – (pd.DataFrame, np.ndarray) observations x subjects data
sampling_freq – (float) sampling freqency of data in Hz
low_cut – (float) lower bound cutoff for high pass filter
high_cut – (float) upper bound cutoff for low pass filter
order – (int) filter order for butterworth bandpass
pairwise – (bool) compute phase angle coherence on pairwise phase angle differences or on raw phase angle.
 Returns
dictionary with mean phase angle, vector length, and rayleigh statistic

nltools.stats.
make_cosine_basis
(nsamples, sampling_freq, filter_length, unit_scale=True, drop=0)[source]¶  Create a series of cosine basis functions for a discrete cosine
transform. Based off of implementation in spm_filter and spm_dctmtx because scipy dct can only apply transforms but not return the basis functions. Like SPM, does not add constant (i.e. intercept), but does retain first basis (i.e. sigmoidal/linear drift)
 Parameters
nsamples (int) – number of observations (e.g. TRs)
sampling_freq (float) – sampling frequency in hertz (i.e. 1 / TR)
filter_length (int) – length of filter in seconds
unit_scale (true) – assure that the basis functions are on the normalized range [1, 1]; default True
drop (int) – index of which early/slow bases to drop if any; default is to drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2)
 Returns
nsamples x number of basis sets numpy array
 Return type
out (ndarray)

nltools.stats.
matrix_permutation
(data1, data2, n_permute=5000, metric='spearman', tail=2, n_jobs= 1, return_perms=False, random_state=None)[source]¶ Permute 2dimensional matrix correlation (mantel test).
Chen, G. et al. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to intersubject correlation analysis at the group level. Neuroimage, 142, 248259.
 Parameters
data1 – (pd.DataFrame, np.array) square matrix
data2 – (pd.DataFrame, np.array) square matrix
n_permute – (int) number of permutations
metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the pvalue; default False
 Returns
(dict) dictionary of permutation results [‘correlation’,’p’]
 Return type
stats

nltools.stats.
multi_threshold
(t_map, p_map, thresh)[source]¶ Threshold test image by multiple pvalue from p image
 Parameters
stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)
p – (Brain_Data) Brain_data instance of pvalues
threshold – (list) list of pvalues to threshold stat image
 Returns
Thresholded Brain_Data instance
 Return type
out

nltools.stats.
one_sample_permutation
(data, n_permute=5000, tail=2, n_jobs= 1, return_perms=False, random_state=None)[source]¶ One sample permutation test using randomization.
 Parameters
data – (pd.DataFrame, pd.Series, np.array) data to permute
n_permute – (int) number of permutations
tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the pvalue; default False
random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)
 Returns
(dict) dictionary of permutation results [‘mean’,’p’]
 Return type
stats

nltools.stats.
pearson
(x, y)[source]¶ Correlates row vector x with each row vector in 2D array y. From neurosynth.stats.py  author: Tal Yarkoni

nltools.stats.
procrustes
(data1, data2)[source]¶ Procrustes analysis, a similarity test for two data sets.
Each input matrix is a set of points or vectors (the rows of the matrix). The dimension of the space is the number of columns of each matrix. Given two identically sized matrices, procrustes standardizes both such that:  \(tr(AA^{T}) = 1\).  Both sets of points are centered around the origin. Procrustes ([1]_, [2]_) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize \(M^{2}=\sum(data1data2)^{2}\), or the sum of the squares of the pointwise differences between the two input datasets. This function was not designed to handle datasets with different numbers of datapoints (rows). If two data sets have different dimensionality (different number of columns), this function will add columns of zeros to the smaller of the two.
 Parameters
data1 – array_like Matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 (must have >1 unique points).
data2 – array_like n rows of data in k space to be fit to data1. Must be the same shape
(numrows, numcols)
as data1 (must have >1 unique points).
 Returns
 array_like
A standardized version of data1.
 mtx2array_like
The orientation of data2 that best fits data1. Centered, but not necessarily \(tr(AA^{T}) = 1\).
 disparityfloat
\(M^{2}\) as defined above.
 R(N, N) ndarray
The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of dot(data1, R)  data2, subject to dot(R.T, R) == I.
 scalefloat
Sum of the singular values of
dot(data1.T, data2)
.
 Return type
mtx1

nltools.stats.
procrustes_distance
(mat1, mat2, n_permute=5000, tail=2, n_jobs= 1, random_state=None)[source]¶ Use procrustes superposition to perform a similarity test between 2 matrices. Matrices need to match in size on their first dimension only, as the smaller matrix on the second dimension will be padded with zeros. After aligning two matrices using the procrustes transformation, use the computed disparity between them (sum of squared error of elements) as a similarity metric. Shuffle the rows of one of the matrices and recompute the disparity to perform inference (PeresNeto & Jackson, 2001).
 Parameters
mat1 (ndarray) – 2d numpy array; must have same number of rows as mat2
mat2 (ndarray) – 1d or 2d numpy array; must have same number of rows as mat1
n_permute (int) – number of permutation iterations to perform
tail (int) – either 1 for onetailed or 2 for twotailed test; default 2
n_jobs (int) – The number of CPUs to use to do permutation; default 1 (all)
 Returns
similarity between matrices bounded between 0 and 1 pval (float): permuted pvalue
 Return type
similarity (float)

nltools.stats.
regress
(X, Y, mode='ols', stats='full', **kwargs)[source]¶ This is a flexible function to run several types of regression models provided X and Y numpy arrays. Y can be a 1d numpy array or 2d numpy array. In the latter case, results will be output with shape 1 x Y.shape[1], in other words fitting a separate regression model to each column of Y.
Does NOT add an intercept automatically to the X matrix before fitting like some other software packages. This is left up to the user.
This function can compute regression in 3 ways: 1) Standard OLS 2) OLS with robust sandwich estimators for standard errors. 3 robust types of estimators exist:
‘hc0’  classic huberwhite estimator robust to heteroscedasticity (default)
‘hc3’  a variant on huberwhite estimator slightly more conservative when sample sizes are small
‘hac’  an estimator robust to both heteroscedasticity and autocorrelation; autocorrelation lag can be controlled with the ‘nlags’ keyword argument; default is 1
ARMA (autoregressive movingaverage) model (experimental). This model is fit through statsmodels.tsa.arima_model.ARMA, so more information about options can be found there. Any settings can be passed in as kwargs. By default fits a (1,1) model with starting lags of 2. This mode is computationally intensive and can take quite a while if Y has many columns. If Y is a 2d array joblib.Parallel is used for faster fitting by parallelizing fits across columns of Y. Parallelization can be controlled by passing in kwargs. Defaults to multithreading using 10 separate threads, as threads don’t require large arrays to be duplicated in memory. Defaults are also set to enable memorymapping for very large arrays if backend=’multiprocessing’ to prevent crashes and hangs. Various levels of progress can be monitored using the ‘disp’ (statsmodels) and ‘verbose’ (joblib) keyword arguments with integer values > 0.
Examples
Standard OLS
>>> results = regress(X,Y,mode='ols')
Robust OLS with heteroscedasticity (hc0) robust standard errors
>>> results = regress(X,Y,mode='robust')
Robust OLS with heteroscedasticty and autocorrelation (with lag 2) robust standard errors
>>> results = regress(X,Y,mode='robust',robust_estimator='hac',nlags=2)
Autoregressive mode with autoregressive and movingaverage lags = 1
>>> results = regress(X,Y,mode='arma',order=(1,1))
Autoregressive model with autoregressive lag = 2, movingaverage lag = 3, and multiprocessing instead of multithreading using 8 cores (this can use a lot of memory if input arrays are very large!).
>>> results = regress(X,Y,mode='arma',order=(2,3),backend='multiprocessing',n_jobs=8)
 Parameters
X (ndarray) – design matrix; assumes intercept is included
Y (ndarray) – dependent variable array; if 2d, a model is fit to each column of Y separately
mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’
robust_estimator (str,optional) – kind of robust estimator to use if mode = ‘robust’; default ‘hc0’
nlags (int,optional) – autocorrelation lag correction if mode = ‘robust’ and robust_estimator = ‘hac’; default 1
order (tuple,optional) – autoregressive and movingaverage orders for mode = ‘arma’; default (1,1)
kwargs (dict) – additional keyword arguments to statsmodels.tsa.arima_model.ARMA and joblib.Parallel
 Returns
coefficients t: tstatistics (coef/sterr) p : pvalues df: degrees of freedom res: residuals
 Return type
b

nltools.stats.
summarize_bootstrap
(data, save_weights=False)[source]¶ Calculate summary of bootstrap samples
 Parameters
sample – (Brain_Data) Brain_Data instance of samples
save_weights – (bool) save bootstrap weights
 Returns
(dict) dictionary of Brain_Data summary images
 Return type
output

nltools.stats.
threshold
(stat, p, thr=0.05, return_mask=False)[source]¶ Threshold test image by pvalue from p image
 Parameters
stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)
p – (Brain_Data) Brain_data instance of pvalues
threshold – (float) pvalue to threshold stat image
return_mask – (bool) optionall return the thresholding mask; default False
 Returns
Thresholded Brain_Data instance
 Return type
out

nltools.stats.
transform_pairwise
(X, y)[source]¶ Transforms data into pairs with balanced labels for ranking Transforms a nclass ranking problem into a twoclass classification problem. Subclasses implementing particular strategies for choosing pairs should override this method. In this method, all pairs are choosen, except for those that have the same target value. The output is an array of balanced classes, i.e. there are the same number of 1 as +1
Reference: “Large Margin Rank Boundaries for Ordinal Regression”, R. Herbrich, T. Graepel, K. Obermayer. Authors: Fabian Pedregosa <fabian@fseoane.net>
Alexandre Gramfort <alexandre.gramfort@inria.fr>
 Parameters
X – (np.array), shape (n_samples, n_features) The data
y – (np.array), shape (n_samples,) or (n_samples, 2) Target labels. If it’s a 2D array, the second column represents the grouping of samples, i.e., samples with different groups will not be considered.
 Returns
 (np.array), shape (k, n_feaures)
Data as pairs, where k = n_samples * (n_samples1)) / 2 if grouping values were not passed. If grouping variables exist, then returns values computed for each group.
 y_trans: (np.array), shape (k,)
Output class labels, where classes have values {1, +1} If y was shape (n_samples, 2), then returns (k, 2) with groups on the second dimension.
 Return type
X_trans

nltools.stats.
trim
(data, cutoff=None)[source]¶ Trim a Pandas DataFrame or Series by replacing outlier values with NaNs
 Parameters
data – (pd.DataFrame, pd.Series) data to trim
cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}
 Returns
(pd.DataFrame, pd.Series) trimmed data
 Return type
out

nltools.stats.
two_sample_permutation
(data1, data2, n_permute=5000, tail=2, n_jobs= 1, return_perms=False, random_state=None)[source]¶ Independent sample permutation test.
 Parameters
data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
n_permute – (int) number of permutations
tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the pvalue; default False
 Returns
(dict) dictionary of permutation results [‘mean’,’p’]
 Return type
stats

nltools.stats.
u_center
(mat)[source]¶ Ucenter a 2d array. Ucentering is a biascorrected form of doublecentering
 Parameters
mat (ndarray) – 2d numpy array
 Returns
ucentered version of input
 Return type
mat (narray)

nltools.stats.
upsample
(data, sampling_freq=None, target=None, target_type='samples', method='linear')[source]¶ Upsample pandas to a new target frequency or number of samples using interpolation.
 Parameters
data – (pd.DataFrame, pd.Series) data to upsample (Note: will drop nonnumeric columns from DataFrame)
sampling_freq – Sampling frequency of data in hertz
target – (float) upsampling target
target_type – (str) type of target can be [samples,seconds,hz]
method – (str) [‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’] where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order (default: linear)
 Returns
upsampled pandas object

nltools.stats.
winsorize
(data, cutoff=None, replace_with_cutoff=True)[source]¶ Winsorize a Pandas DataFrame or Series with the largest/lowest value not considered outlier
 Parameters
data – (pd.DataFrame, pd.Series) data to winsorize
cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}
replace_with_cutoff – (bool) If True, replace outliers with cutoff. If False, replaces outliers with closest existing values; (default: False)
 Returns
(pd.DataFrame, pd.Series) winsorized data
 Return type
out
nltools.datasets
: Dataset Tools¶
NeuroLearn datasets¶
functions to help download datasets

nltools.datasets.
download_collection
(collection=None, data_dir=None, overwrite=False, resume=True, verbose=1)[source]¶ Download images and metadata from Neurovault collection
 Parameters
collection (int, optional) – collection id. Defaults to None.
data_dir (str, optional) – data directory. Defaults to None.
overwrite (bool, optional) – overwrite data directory. Defaults to False.
resume (bool, optional) – resume download. Defaults to True.
verbose (int, optional) – print diagnostic messages. Defaults to 1.
 Returns
(DataFrame of image metadata, list of files from downloaded collection)
 Return type
(pd.DataFrame, list)

nltools.datasets.
fetch_emotion_ratings
(data_dir=None, resume=True, verbose=1)[source]¶ Download and loads emotion rating dataset from neurovault
 Parameters
data_dir – (string, optional). Path of the data directory. Used to force data storage in a specified location. Default: None
 Returns
(Brain_Data) Brain_Data object with downloaded data. X=metadata
 Return type
out

nltools.datasets.
fetch_pain
(data_dir=None, resume=True, verbose=1)[source]¶ Download and loads pain dataset from neurovault
 Parameters
data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location. Default: None
 Returns
(Brain_Data) Brain_Data object with downloaded data. X=metadata
 Return type
out

nltools.datasets.
get_collection_image_metadata
(collection=None, data_dir=None, limit=10)[source]¶ Get image metadata associated with collection
 Parameters
collection (int, optional) – collection id. Defaults to None.
data_dir (str, optional) – data directory. Defaults to None.
limit (int, optional) – number of images to increment. Defaults to 10.
 Returns
Dataframe with full image metadata from collection
 Return type
pd.DataFrame
nltools.cross_validation
: CrossValidation Tools¶
CrossValidation Data Classes¶
Scikitlearn compatible classes for performing various types of crossvalidation

class
nltools.cross_validation.
KFoldStratified
(n_splits=3, shuffle=False, random_state=None)[source]¶ KFolds cross validation iterator which stratifies continuous data (unlike scikitlearn equivalent).
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k  1 remaining folds form the training set. Extension of KFold from scikitlearn cross_validation model
 Parameters
n_splits – int, default=3 Number of folds. Must be at least 2.
shuffle – boolean, optional Whether to shuffle the data before splitting into batches.
random_state – None, int or RandomState Pseudorandom number generator state used for random sampling. If None, use default numpy RNG for shuffling

split
(X, y, groups=None)[source]¶ Generate indices to split data into training and test set.
 Parameters
X – arraylike, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.y – arraylike, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.
groups – (object) Always ignored, exists for compatibility.
 Returns
(ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.
 Return type
train

nltools.cross_validation.
set_cv
(Y=None, cv_dict=None, return_generator=True)[source]¶ Helper function to create a scikit learn compatible cv object using common parameters for prediction analyses.
 Parameters
Y – (pd.DataFrame) Pandas Dataframe of Y labels
cv_dict – (dict) Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout}
return_generator (bool) – return a cv generator instead of an instance; default True
 Returns
a scikitlearn modelselection generator
 Return type
cv

class
nltools.cross_validation.
KFoldStratified
(n_splits=3, shuffle=False, random_state=None)[source]¶ KFolds cross validation iterator which stratifies continuous data (unlike scikitlearn equivalent).
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k  1 remaining folds form the training set. Extension of KFold from scikitlearn cross_validation model
 Parameters
n_splits – int, default=3 Number of folds. Must be at least 2.
shuffle – boolean, optional Whether to shuffle the data before splitting into batches.
random_state – None, int or RandomState Pseudorandom number generator state used for random sampling. If None, use default numpy RNG for shuffling

split
(X, y, groups=None)[source]¶ Generate indices to split data into training and test set.
 Parameters
X – arraylike, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.y – arraylike, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.
groups – (object) Always ignored, exists for compatibility.
 Returns
(ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.
 Return type
train
nltools.mask
: Mask Tools¶
NeuroLearn Mask Classes¶
Classes to represent masks

nltools.mask.
collapse_mask
(mask, auto_label=True, custom_mask=None)[source]¶  collapse separate masks into one mask with multiple integers
overlapping areas are ignored
 Parameters
mask – nibabel or Brain_Data instance
custom_mask – nibabel instance or string to file path; optional
 Returns
 Brain_Data instance of a mask with different integers indicating
different masks
 Return type
out

nltools.mask.
create_sphere
(coordinates, radius=5, mask=None)[source]¶ Generate a set of spheres in the brain mask space
 Parameters
radius – vector of radius. Will create multiple spheres if len(radius) > 1
centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

nltools.mask.
expand_mask
(mask, custom_mask=None)[source]¶ expand a mask with multiple integers into separate binary masks
 Parameters
mask – nibabel or Brain_Data instance
custom_mask – nibabel instance or string to file path; optional
 Returns
Brain_Data instance of multiple binary masks
 Return type
out

nltools.mask.
roi_to_brain
(data, mask_x)[source]¶ This function will create convert an expanded binary mask of ROIs (see expand_mask) based on a vector of of values. The dataframe of values must correspond to ROI numbers.
This is useful for populating a parcellation scheme by a vector of Values
 Parameters
data – Pandas series, dataframe, list, np.array of ROI by observation
mask_x – an expanded binary mask
 Returns
 (Brain_Data) Brain_Data instance where each ROI is now populated
with a value
 Return type
out
nltools.file_reader
: File Reading¶
NeuroLearn File Reading Tools¶

nltools.file_reader.
onsets_to_dm
(F, sampling_freq, run_length, header='infer', sort=False, keep_separate=True, add_poly=None, unique_cols=None, fill_na=None, **kwargs)[source]¶ This function can assist in reading in one or several in a 23 column onsets files, specified in seconds and converting it to a Design Matrix organized as samples X Stimulus Classes. sampling_freq should be specified in hertz; for TRs use hertz = 1/TR. Onsets files must be organized with columns in one of the following 4 formats:
‘Stim, Onset’
‘Onset, Stim’
‘Stim, Onset, Duration’
‘Onset, Duration, Stim’
No other file organizations are currently supported. Note: Stimulus offsets (onset + duration) that fall into an adjacent TR include that full TR. E.g. offset of 10.16s with TR = 2 has an offset of TR 5, which spans 1012s, rather than an offset of TR 4, which spans 810s.
 Parameters
F (filepath/DataFrame/list) – path to file, pandas dataframe, or list of files or pandas dataframes
sampling_freq (float) – sampling frequency in hertz; for TRs use (1 / TR) run_length (int): number of TRs in the run these onsets came from
sort (bool, optional) – whether to sort the columns of the resulting design matrix alphabetically; defaults to False
(int, optional (addpoly) – what order polynomial terms to add as new columns (e.g. 0 for intercept, 1 for linear trend and intercept, etc); defaults to None
header (str,optional) – None if missing header, otherwise pandas header keyword; defaults to ‘infer’
keep_separate (bool) – whether to seperate polynomial columns if reading a list of files and using the addpoly option; defaults to True
unique_cols (list, optional) – additional columns to keep seperate across files (e.g. spikes); defaults to []
fill_na (str/int/float, optional) – what value fill NaNs in with if reading in a list of files; defaults to None
kwargs – additional inputs to pandas.read_csv
Returns – Design_Matrix class
nltools.util
: Utilities¶
NeuroLearn Utilities¶
handy utilities.

nltools.utils.
get_anatomical
()[source]¶ Get nltools default anatomical image. DEPRECATED. See MNI_Template and resolve_mni_path from nltools.prefs

nltools.utils.
set_algorithm
(algorithm, *args, **kwargs)[source]¶ Setup the algorithm to use in subsequent prediction analyses.
 Parameters
algorithm – The prediction algorithm to use. Either a string or an (uninitialized) scikitlearn prediction object. If string, must be one of ‘svm’,’svr’, linear’,’logistic’,’lasso’, ‘lassopcr’,’lassoCV’,’ridge’,’ridgeCV’,’ridgeClassifier’, ‘randomforest’, or ‘randomforestClassifier’
kwargs – Additional keyword arguments to pass onto the scikitlearn clustering object.
 Returns
dictionary of settings for prediction
 Return type
predictor_settings

nltools.utils.
set_decomposition_algorithm
(algorithm, n_components=None, *args, **kwargs)[source]¶ Setup the algorithm to use in subsequent decomposition analyses.
 Parameters
algorithm – The decomposition algorithm to use. Either a string or an (uninitialized) scikitlearn decomposition object. If string must be one of ‘pca’,’nnmf’, ica’,’fa’, ‘dictionary’, ‘kernelpca’.
kwargs – Additional keyword arguments to pass onto the scikitlearn clustering object.
 Returns
dictionary of settings for prediction
 Return type
predictor_settings
nltools.plotting
: Plotting Tools¶
NeuroLearn Plotting Tools¶
Numerous functions to plot data

nltools.plotting.
dist_from_hyperplane_plot
(stats_output)[source]¶ Plot SVM Classification Distance from Hyperplane
 Parameters
stats_output – a pandas file with prediction output
 Returns
Will return a seaborn plot of distance from hyperplane
 Return type
fig

nltools.plotting.
plot_between_label_distance
(distance, labels, ax=None, permutation_test=True, n_permute=5000, fontsize=18, **kwargs)[source]¶ Create a heatmap indicating average between label distance
 Parameters
distance – (pandas dataframe) brain_distance matrix
labels – (pandas dataframe) group labels
ax – axis to plot (default=None)
permutation_test – (boolean)
n_permute – (int) number of samples for permuation test
fontsize – (int) size of font for plot
 Returns
heatmap out: pandas dataframe of pairwise distance between conditions within_dist_out: average pairwise distance matrix mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions p_dist_out: (optional if permutation_test=True) pvalue for difference in distance between conditions
 Return type
f

nltools.plotting.
plot_brain
(objIn, how='full', thr_upper=None, thr_lower=None, save=False, **kwargs)[source]¶ More complete brain plotting of a Brain_Data instance :param obj: (Brain_Data) object to plot :param how: (str) whether to plot a glass brain ‘glass’, 3 viewmultislice mni ‘mni’, or both ‘full’ :param thr_upper: (str/float) thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold() :param thr_lower: (str/float) thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold() :param save: if a string file name or path is provided plots will be saved into this directory appended with the orientation they belong to :type save: str :param kwargs: optionals args to nilearn plot functions (e.g. vmax)

nltools.plotting.
plot_interactive_brain
(brain, threshold=1e06, surface=False, percentile_threshold=False, anatomical=None, **kwargs)[source]¶ This function leverages nilearn’s new javascript based brain viewer functions to create interactive plotting functionality.
 Parameters
brain (nltools.Brain_Data) – a Brain_Data instance of 1d or 2d shape (i.e. 3d or 4d volume)
threshold (float/str) – threshold to initialize the visualization, maybe be a percentile string; default 0
surface (bool) – whether to create a surfacebased plot; default False
percentile_threshold (bool) – whether to interpret threshold values as percentiles
kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
 Returns
interactive brain viewer widget

nltools.plotting.
plot_mean_label_distance
(distance, labels, ax=None, permutation_test=False, n_permute=5000, fontsize=18, **kwargs)[source]¶ Create a violin plot indicating within and between label distance.
 Parameters
distance – pandas dataframe of distance
labels – labels indicating columns and rows to group
ax – matplotlib axis to plot on
permutation_test – (bool) indicates whether to run permuatation test or not
n_permute – (int) number of permutations to run
fontsize – (int) fontsize for plot labels
 Returns
heatmap stats: (optional if permutation_test=True) permutation results
 Return type
f

nltools.plotting.
plot_silhouette
(distance, labels, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶ Create a silhouette plot indicating between relative to within label distance
 Parameters
distance – (pandas dataframe) brain_distance matrix
labels – (pandas dataframe) group labels
ax – axis to plot (default=None)
permutation_test – (boolean)
n_permute – (int) number of samples for permuation test
 Optional keyword args:
figsize: (list) dimensions of silhouette plot colors: (list) color triplets for silhouettes. Length must equal number of unique labels
 Returns
heatmap # out: pandas dataframe of pairwise distance between conditions # within_dist_out: average pairwise distance matrix # mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions # p_dist_out: (optional if permutation_test=True) pvalue for difference in distance between conditions
 Return type
# f

nltools.plotting.
plot_stacked_adjacency
(adjacency1, adjacency2, normalize=True, **kwargs)[source]¶ Create stacked adjacency to illustrate similarity.
 Parameters
matrix1 – Adjacency instance 1
matrix2 – Adjacency instance 2
normalize – (boolean) Normalize matrices.
 Returns
matplotlib figure

nltools.plotting.
plot_t_brain
(objIn, how='full', thr='unc', alpha=None, nperm=None, cut_coords=[], **kwargs)[source]¶ Takes a brain data object and computes a 1 sample ttest across it’s first axis. If a list is provided will compute difference between brain data objects in list (i.e. paired samples ttest). :param objIn: (list/Brain_Data) if list will compute difference map first :param how: (list) whether to plot a glass brain ‘glass’, 3 viewmultislice mni ‘mni’, or both ‘full’ :param thr: (str) what method to use for multiple comparisons correction unc, fdr, or tfce :param alpha: (float) pvalue threshold :param nperm: (int) number of permutations for tcfe; default 1000 :param cut_coords: (list) x,y,z coords to plot brain slice :param kwargs: optionals args to nilearn plot functions (e.g. vmax)

nltools.plotting.
probability_plot
(stats_output)[source]¶ Plot Classification Probability
 Parameters
stats_output – a pandas file with prediction output
 Returns
Will return a seaborn scatterplot
 Return type
fig
nltools.simulator
: Simulator Tools¶
NeuroLearn Simulator Tools¶
Tools to simulate multivariate data.

class
nltools.simulator.
Simulator
(brain_mask=None, output_dir=None, random_state=None)[source]¶ 
create_cov_data
(cor, cov, sigma, mask=None, reps=1, n_sub=1, output_dir=None)[source]¶ create continuous simulated data with covariance
 Parameters
cor – amount of covariance between each voxel and Y variable
cov – amount of covariance between voxels
sigma – amount of noise to add
radius – vector of radius. Will create multiple spheres if len(radius) > 1
center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
reps – number of data repetitions
n_sub – number of subjects to simulate
output_dir – string path of directory to output data. If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm

create_data
(levels, sigma, radius=5, center=None, reps=1, output_dir=None)[source]¶ create simulated data with integers
 Parameters
levels – vector of intensities or class labels
sigma – amount of noise to add
radius – vector of radius. Will create multiple spheres if len(radius) > 1
center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
reps – number of data repetitions useful for trials or subjects
output_dir – string path of directory to output data. If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm

create_ncov_data
(cor, cov, sigma, masks=None, reps=1, n_sub=1, output_dir=None)[source]¶ create continuous simulated data with covariance
 Parameters
cor – amount of covariance between each voxel and Y variable (an int or a vector)
cov – amount of covariance between voxels (an int or a matrix)
sigma – amount of noise to add
mask – region(s) where we will have activations (list if more than one)
reps – number of data repetitions
n_sub – number of subjects to simulate
output_dir – string path of directory to output data. If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm

gaussian
(mu, sigma, i_tot)[source]¶ create a 3D gaussian signal normalized to a given intensity
 Parameters
mu – average value of the gaussian signal (usually set to 0)
sigma – standard deviation
i_tot – sum total of activation (numerical integral over the gaussian returns this value)

n_spheres
(radius, center)[source]¶ generate a set of spheres in the brain mask space
 Parameters
radius – vector of radius. Will create multiple spheres if len(radius) > 1
centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

normal_noise
(mu, sigma)[source]¶ produce a normal noise distribution for all all points in the brain mask
 Parameters
mu – average value of the gaussian signal (usually set to 0)
sigma – standard deviation
