API Reference

This reference provides detailed documentation for all modules, classes, and methods in the current release of Neurolearn.

nltools.data: Data Types

class nltools.data.Brain_Data(data=None, Y=None, X=None, mask=None, **kwargs)[source]

Brain_Data is a class to represent neuroimaging data in python as a vector rather than a 3-dimensional matrix.This makes it easier to perform data manipulation and analyses.

Parameters:
  • data – nibabel data instance or list of files

  • Y – Pandas DataFrame of training labels

  • X – Pandas DataFrame Design Matrix for running univariate models

  • mask – binary nifiti file to mask brain data

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

aggregate(mask, func)[source]

Create new Brain_Data instance that aggregages func over mask

align(target, method='procrustes', axis=0, *args, **kwargs)[source]

Align Brain_Data instance to target object using functional alignment

Alignment type can be hyperalignment or Shared Response Model. When using hyperalignment, target image can be another subject or an already estimated common model. When using SRM, target must be a previously estimated common model stored as a numpy array. Transformed data can be back projected to original data using Tranformation matrix.

See nltools.stats.align for aligning multiple Brain_Data instances

Parameters:
  • target – (Brain_Data) object to align to.

  • method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]

  • axis – (int) axis to align on

Returns:

(dict) a dictionary containing transformed object,

transformation matrix, and the shared response matrix

Return type:

out

Examples

  • Hyperalign using procrustes transform:
    >>> out = data.align(target, method='procrustes')
    
  • Align using shared response model:
    >>> out = data.align(target, method='probabilistic_srm', n_features=None)
    
  • Project aligned data into original data:
    >>> original_data = np.dot(out[‘transformed’].data,out[‘transformation_matrix’].T)
    
append(data, **kwargs)[source]

Append data to Brain_Data instance

Parameters:
  • data – (Brain_Data) Brain_Data instance to append

  • kwargs – optional inputs to Design_Matrix append

Returns:

(Brain_Data) new appended Brain_Data instance

Return type:

out

apply_mask(mask, resample_mask_to_brain=False)[source]

Mask Brain_Data instance

Note target data will be resampled into the same space as the mask. If you would like the mask resampled into the Brain_Data space, then set resample_mask_to_brain=True.

Parameters:
  • mask – (Brain_Data or nifti object) mask to apply to Brain_Data object.

  • resample_mask_to_brain – (bool) Will resample mask to brain space before applying mask (default=False).

Returns:

(Brain_Data) masked Brain_Data object

Return type:

masked

astype(dtype)[source]

Cast Brain_Data.data as type.

Parameters:

dtype – datatype to convert

Returns:

Brain_Data instance with new datatype

Return type:

Brain_Data

bootstrap(function, n_samples=5000, save_weights=False, n_jobs=-1, random_state=None, *args, **kwargs)[source]

Bootstrap a Brain_Data method.

Parameters:
  • function – (str) method to apply to data for each bootstrap

  • n_samples – (int) number of samples to bootstrap with replacement

  • save_weights – (bool) Save each bootstrap iteration (useful for aggregating

  • cluster) (many bootstraps on a) –

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all

  • CPUs.Returns

Returns:

summarized studentized bootstrap output

Return type:

output

Examples

>>>  b = dat.bootstrap('mean', n_samples=5000)
>>>  b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge')
>>>  b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
copy()[source]

Create a copy of a Brain_Data instance.

decompose(algorithm='pca', axis='voxels', n_components=None, *args, **kwargs)[source]

Decompose Brain_Data object

Parameters:
  • algorithm – (str) Algorithm to perform decomposition types=[‘pca’,’ica’,’nnmf’,’fa’,’dictionary’,’kernelpca’]

  • axis – dimension to decompose [‘voxels’,’images’]

  • n_components – (int) number of components. If None then retain as many as possible.

Returns:

a dictionary of decomposition parameters

Return type:

output

detrend(method='linear')[source]

Remove linear trend from each voxel

Parameters:

type – (‘linear’,’constant’, optional) type of detrending

Returns:

(Brain_Data) detrended Brain_Data instance

Return type:

out

distance(metric='euclidean', **kwargs)[source]

Calculate distance between images within a Brain_Data() instance.

Parameters:

metric – (str) type of distance metric (can use any scikit learn or sciypy metric)

Returns:

(Adjacency) Outputs a 2D distance matrix.

Return type:

dist

dtype()[source]

Get data type of Brain_Data.data.

empty(data=True, Y=True, X=True)[source]

Initalize Brain_Data.data as empty

extract_roi(mask, metric='mean', n_components=None)[source]

Extract activity from mask

Parameters:
  • mask – (nifti) nibabel mask can be binary or numbered for different rois

  • metric – type of extraction method [‘mean’, ‘median’, ‘pca’], (default=mean) NOTE: Only mean currently works!

  • n_components – if metric=’pca’, number of components to return (takes any input into sklearn.Decomposition.PCA)

Returns:

mean within each ROI across images

Return type:

out

filter(sampling_freq=None, high_pass=None, low_pass=None, **kwargs)[source]

Apply 5th order butterworth filter to data. Wraps nilearn functionality. Does not default to detrending and standardizing like nilearn implementation, but this can be overridden using kwargs.

Parameters:
  • sampling_freq – sampling freq in hertz (i.e. 1 / TR)

  • high_pass – high pass cutoff frequency

  • low_pass – low pass cutoff frequency

  • kwargs – other keyword arguments to nilearn.signal.clean

Returns:

Filtered Brain_Data instance

Return type:

Brain_Data

find_spikes(global_spike_cutoff=3, diff_spike_cutoff=3)[source]

Function to identify spikes from Time Series Data

Parameters:
  • global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.

  • diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.

Returns:

pandas dataframe with spikes as indicator variables

groupby(mask)[source]

Create groupby instance

icc(icc_type='icc2')[source]
Calculate intraclass correlation coefficient for data within

Brain_Data class

ICC Formulas are based on: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.

icc1: x_ij = mu + beta_j + w_ij icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij

Code modifed from nipype algorithms.icc https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py

Parameters:

icc_type – type of icc to calculate (icc: voxel random effect, icc2: voxel and column random effect, icc3: voxel and column fixed effect)

Returns:

(np.array) intraclass correlation coefficient

Return type:

ICC

iplot(threshold=0, surface=False, anatomical=None, **kwargs)[source]

Create an interactive brain viewer for the current brain data instance.

Parameters:
  • threshold – (float/str) two-sided threshold to initialize the visualization, maybe be a percentile string; default 0

  • surface – (bool) whether to create a surface-based plot; default False

  • anatomical – nifti image or filename to overlay

  • kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf

Returns:

interactive brain viewer widget

isempty()[source]

Check if Brain_Data.data is empty

mean(axis=0)[source]

Get mean of each voxel or image

Parameters:

axis – (int) across images=0 (default), within images=1

Returns:

(float/np.array/Brain_Data)

Return type:

out

median(axis=0)[source]

Get median of each voxel or image

Parameters:

axis – (int) across images=0 (default), within images=1

Returns:

(float/np.array/Brain_Data)

Return type:

out

multivariate_similarity(images, method='ols')[source]

Predict spatial distribution of Brain_Data() instance from linear combination of other Brain_Data() instances or Nibabel images

Parameters:
  • self – Brain_Data instance of data to be applied

  • images – Brain_Data instance of weight map

Returns:

dictionary of regression statistics in Brain_Data

instances {‘beta’,’t’,’p’,’df’,’residual’}

Return type:

out

plot(limit=5, anatomical=None, view='axial', colorbar=False, black_bg=True, draw_cross=False, threshold_upper=None, threshold_lower=None, figsize=(15, 2), axes=None, **kwargs)[source]

Create a quick plot of self.data. Will plot each image separately

Parameters:
  • limit – (int) max number of images to return

  • anatomical – (nifti, str) nifti image or file name to overlay

  • view – (str) ‘axial’ for limit number of axial slices; ‘glass’ for ortho-view glass brain; ‘mni’ for multi-slice view mni brain; ‘full’ for both glass and mni views

  • threshold_upper – (str/float) threshold if view is ‘glass’, ‘mni’, or ‘full’

  • threshold_lower – (str/float)threshold if view is ‘glass’, ‘mni’, or ‘full’

  • save – (str/bool): optional string file name or path for saving; only applies if view is ‘mni’, ‘glass’, or ‘full’. Filenames will appended with the orientation they belong to

predict(algorithm=None, cv_dict=None, plot=True, verbose=True, **kwargs)[source]

Run prediction

Parameters:
  • algorithm – Algorithm to use for prediction. Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’

  • cv_dict – Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y

  • plot – Boolean indicating whether or not to create plots.

  • verbose (bool) – print performance; Default True

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

Returns:

a dictionary of prediction parameters

Return type:

output

predict_multi(algorithm=None, cv_dict=None, method='searchlight', rois=None, process_mask=None, radius=2.0, scoring=None, n_jobs=1, verbose=0, **kwargs)[source]

Perform multi-region prediction. This can be a searchlight analysis or multi-roi analysis if provided a Brain_Data instance with labeled non-overlapping rois.

Parameters:
  • algorithm (string) – algorithm to use for prediction Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’

  • cv_dict – Type of cross_validation to use. Default is 3-fold. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y

  • method (string) – one of ‘searchlight’ or ‘roi’

  • rois (string/nltools.Brain_Data) – nifti file path or Brain_data instance containing non-overlapping regions-of-interest labeled by integers

  • process_mask (nib.Nifti1Image/nltools.Brain_Data) – mask to constrain where to perform analyses; only applied if method = ‘searchlight’

  • radius (float) – radius of searchlight in mm; default 2mm

  • scoring (function) – callable scoring function; see sklearn documentation; defaults to estimator’s default scoring function

  • n_jobs (int) – The number of CPUs to use to do permutation; default 1 because this can be very memory intensive

  • verbose (int) – whether parallelization progress should be printed; default 0

Returns:

image of results

Return type:

output

r_to_z()[source]

Apply Fisher’s r to z transformation to each element of the data object.

randomise(n_permute=5000, threshold_dict=None, return_mask=False, **kwargs)[source]

Run mass-univariate regression at each voxel with inference performed via permutation testing ala randomise in FSL. Operates just like .regress(), but intended to be used for second-level analyses.

Parameters:
  • n_permute (int) – number of permutations

  • threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}

  • return_mask – (bool) optionally return the thresholding mask

Returns:

dictionary of maps for betas, tstats, and pvalues

Return type:

out

regions(min_region_size=1350, extract_type='local_regions', smoothing_fwhm=6, is_mask=False)[source]

Extract brain connected regions into separate regions.

Parameters:
  • min_region_size (int) – Minimum volume in mm3 for a region to be kept.

  • extract_type (str) – Type of extraction method [‘connected_components’, ‘local_regions’]. If ‘connected_components’, each component/region in the image is extracted automatically by labelling each region based upon the presence of unique features in their respective regions. If ‘local_regions’, each component/region is extracted based on their maximum peak value to define a seed marker and then using random walker segementation algorithm on these markers for region separation.

  • smoothing_fwhm (scalar) – Smooth an image to extract more sparser regions. Only works for extract_type ‘local_regions’.

  • is_mask (bool) – Whether the Brain_Data instance should be treated as a boolean mask and if so, calls connected_label_regions instead.

Returns:

Brain_Data instance with extracted ROIs as data.

Return type:

Brain_Data

regress(mode='ols', **kwargs)[source]

Run a mass-univariate regression across voxels. Three types of regressions can be run: 1) Standard OLS (default) 2) Robust OLS (heteroscedasticty and/or auto-correlation robust errors), i.e. OLS with “sandwich estimators” 3) ARMA (auto-regressive and moving-average lags = 1 by default; experimental)

For more information see the help for nltools.stats.regress

ARMA notes: This experimental mode is similar to AFNI’s 3dREMLFit but without spatial smoothing of voxel auto-correlation estimates. It can be very computationally intensive so parallelization is used by default to try to speed things up. Speed is limited because a unique ARMA model is fit to each voxel (like AFNI/FSL), but unlike SPM, which assumes the same AR parameters (~0.2) at each voxel. While coefficient results are typically very similar to OLS, std-errors and so t-stats, dfs and and p-vals can differ greatly depending on how much auto-correlation is explaining the response in a voxel relative to other regressors in the design matrix.

Parameters:
  • mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’

  • kwargs (dict) – keyword arguments to nltools.stats.regress

Returns:

dictionary of regression statistics in Brain_Data instances

{‘beta’,’t’,’p’,’df’,’residual’}

Return type:

out

scale(scale_val=100.0)[source]
Scale all values such that they are on the range [0, scale_val],

via grand-mean scaling. This is NOT global-scaling/intensity normalization. This is useful for ensuring that data is on a common scale (e.g. good for multiple runs, participants, etc) and if the default value of 100 is used, can be interpreted as something akin to (but not exactly) “percent signal change.” This is consistent with default behavior in AFNI and SPM. Change this value to 10000 to make consistent with FSL.

Parameters:

scale_val – (int/float) what value to send the grand-mean to; default 100

shape()[source]

Get images by voxels shape.

similarity(image, method='correlation')[source]

Calculate similarity of Brain_Data() instance with single Brain_Data or Nibabel image

Parameters:
  • image – (Brain_Data, nifti) image to evaluate similarity

  • method – (str) Type of similarity [‘correlation’,’dot_product’,’cosine’]

Returns:

(list) Outputs a vector of pattern expression values

Return type:

pexp

smooth(fwhm)[source]

Apply spatial smoothing using nilearn smooth_img()

Parameters:

fwhm – (float) full width half maximum of gaussian spatial filter

Returns:

Brain_Data instance

standardize(axis=0, method='center')[source]

Standardize Brain_Data() instance.

Parameters:
  • axis – 0 for observations 1 for voxels

  • method – [‘center’,’zscore’]

Returns:

Brain_Data Instance

std(axis=0)[source]

Get standard deviation of each voxel or image.

Parameters:

axis – (int) across images=0 (default), within images=1

Returns:

(float/np.array/Brain_Data)

Return type:

out

sum()[source]

Sum over voxels.

temporal_resample(sampling_freq=None, target=None, target_type='hz')[source]

Resample Brain_Data timeseries to a new target frequency or number of samples using Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation. This function can up- or down-sample data.

Note: this function can use quite a bit of RAM.

Parameters:
  • sampling_freq – (float) sampling frequency of data in hertz

  • target – (float) upsampling target

  • target_type – (str) type of target can be [samples,seconds,hz]

Returns:

upsampled Brain_Data instance

threshold(upper=None, lower=None, binarize=False, coerce_nan=True)[source]
Threshold Brain_Data instance. Provide upper and lower values or

percentages to perform two-sided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every non-zero value.

Parameters:
  • upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

  • lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

  • binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every non-zero value; default False

  • coerce_nan (bool) – coerce nan values to 0s; default True

Returns:

Thresholded Brain_Data object.

to_nifti()[source]

Convert Brain_Data Instance into Nifti Object

transform_pairwise()[source]

Extract brain connected regions into separate regions.

Args:

Returns:

Brain_Data instance tranformed into pairwise comparisons

Return type:

Brain_Data

ttest(threshold_dict=None, return_mask=False)[source]

Calculate one sample t-test across each voxel (two-sided)

Parameters:
  • threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}

  • return_mask – (bool) if thresholding is requested, optionall return the mask of voxels that exceed threshold, e.g. for use with another map

Returns:

(dict) dictionary of regression statistics in Brain_Data

instances {‘t’,’p’}

Return type:

out

upload_neurovault(access_token=None, collection_name=None, collection_id=None, img_type=None, img_modality=None, **kwargs)[source]
Upload Data to Neurovault. Will add any columns in self.X to image

metadata. Index will be used as image name.

Parameters:
  • access_token – (str, Required) Neurovault api access token

  • collection_name – (str, Optional) name of new collection to create

  • collection_id – (int, Optional) neurovault collection_id if adding images to existing collection

  • img_type – (str, Required) Neurovault map_type

  • img_modality – (str, Required) Neurovault image modality

Returns:

(pd.DataFrame) neurovault collection information

Return type:

collection

write(file_name)[source]

Write out Brain_Data object to Nifti or HDF5 File.

Parameters:

file_name – (str) name of nifti file including path

z_to_r()[source]

Convert z score back into r value for each element of data object

class nltools.data.Adjacency(data=None, Y=None, matrix_type=None, labels=None, **kwargs)[source]

Adjacency is a class to represent Adjacency matrices as a vector rather than a 2-dimensional matrix. This makes it easier to perform data manipulation and analyses.

Parameters:
  • data – pandas data instance or list of files

  • matrix_type – (str) type of matrix. Possible values include: [‘distance’,’similarity’,’directed’,’distance_flat’, ‘similarity_flat’,’directed_flat’]

  • Y – Pandas DataFrame of training labels

  • **kwargs – Additional keyword arguments

append(data)[source]

Append data to Adjacency instance

Parameters:

data – (Adjacency) Adjacency instance to append

Returns:

(Adjacency) new appended Adjacency instance

Return type:

out

bootstrap(function, n_samples=5000, save_weights=False, n_jobs=-1, random_state=None, *args, **kwargs)[source]

Bootstrap an Adjacency method.

Parameters:
  • function – (str) method to apply to data for each bootstrap

  • n_samples – (int) number of samples to bootstrap with replacement

  • save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.Returns:

Returns:

summarized studentized bootstrap output

Examples

>>>  b = dat.bootstrap('mean', n_samples=5000)
>>>  b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge')
>>>  b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
cluster_summary(clusters=None, metric='mean', summary='within')[source]

This function provides summaries of clusters within Adjacency matrices.

It can compute mean/median of within and between cluster values. Requires a list of cluster ids indicating the row/column of each cluster.

Parameters:
  • clusters – (list) list of cluster labels

  • metric – (str) method to summarize mean or median. If ‘None” then return all r values

  • summary – (str) summarize within cluster or between clusters

Returns:

(dict) within cluster means

Return type:

dict

copy()[source]

Create a copy of Adjacency object.

distance(metric='correlation', **kwargs)[source]

Calculate distance between images within an Adjacency() instance.

Parameters:

metric – (str) type of distance metric (can use any scikit learn or sciypy metric)

Returns:

(Adjacency) Outputs a 2D distance matrix.

Return type:

dist

distance_to_similarity(metric='correlation', beta=1)[source]

Convert distance matrix to similarity matrix.

Note: currently only implemented for correlation and euclidean.

Parameters:
  • metric – (str) Can only be correlation or euclidean

  • beta – (float) parameter to scale exponential function (default: 1) for euclidean

Returns:

(Adjacency) Adjacency object

Return type:

out

generate_permutations(n_perm, random_state=None)[source]

Generate n_perm permutated versions of Adjacency in a lazy fashion. Useful for iterating against.

Parameters:
  • n_perm (int) – number of permutations

  • random_state (int, np.random.seed, optional) – random seed for reproducibility. Defaults to None.

Examples

>>> for perm in adj.generate_permutations(1000):
>>>     out = neural_distance_mat.similarity(perm)
>>>     ...
Yields:

Adjacency – permuted version of self

isc(n_samples=5000, metric='median', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

Compute intersubject correlation.

This implementation uses the subject-wise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). As recommended by Chen et al., 2016, we compute the median pairwise ISC by default. However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values. We compute the p-values using the percentile method using the same method in Brainiak.

Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

Parameters:
  • n_bootstraps – (int) number of bootstraps

  • metric – (str) type of association metric [‘spearman’,’pearson’,’kendall’]

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_parms – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘correlation’,’p’]

Return type:

stats

isc_group(group, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

Compute intersubject correlation differences between groups.

This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.

There are currently two different methods to compute p-values. By default, we use the subject-wise permutation method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both within and between the groups. Then the group labels are permuted and the mean difference between the two groups are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group and the ISC difference between these groups is used to generate a null distribution. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method (Hall & Wilson, 1991).

Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

Parameters:
  • group – (np.array) vector of group ids corresponding to subject data in Adjacency instance

  • n_samples – (int) number of samples for permutation or bootstrapping

  • metric – (str) type of isc summary metric [‘mean’,’median’]

  • method – (str) method to compute p-values [‘permute’, ‘bootstrap’] (default: permute)

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_null – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘correlation’,’p’]

Return type:

stats

isempty()[source]

Check if Adjacency object is empty

mean(axis=0)[source]

Calculate mean of Adjacency

Parameters:

axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.

Returns:

float if single, adjacency if axis=0, np.array if axis=1

and multiple

Return type:

mean

median(axis=0)[source]

Calculate median of Adjacency

Parameters:

axis – (int) calculate median over features (0) or data (1). For data it will be on upper triangle.

Returns:

float if single, adjacency if axis=0, np.array if axis=1

and multiple

Return type:

mean

plot(limit=3, axes=None, *args, **kwargs)[source]

Create Heatmap of Adjacency Matrix

Can pass in any sns.heatmap argument

Parameters:
  • limit – (int) number of heatmaps to plot if object contains multiple adjacencies (default: 3)

  • axes – matplotlib axis handle

plot_label_distance(labels=None, ax=None)[source]

Create a violin plot indicating within and between label distance

Parameters:

labels (np.array) – numpy array of labels to plot

Returns:

violin plot handles

Return type:

f

plot_mds(n_components=2, metric=True, labels=None, labels_color=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_jobs=-1, view=(30, 20), figsize=[12, 8], ax=None, *args, **kwargs)[source]

Plot Multidimensional Scaling

Parameters:
  • n_components – (int) Number of dimensions to project (can be 2 or 3)

  • metric – (bool) Perform metric or non-metric dimensional scaling; default

  • labels – (list) Can override labels stored in Adjacency Class

  • labels_color – (str) list of colors for labels, if len(1) then make all same color

  • n_jobs – (int) Number of parallel jobs

  • view – (tuple) view for 3-Dimensional plot; default (30,20)

plot_silhouette(labels=None, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]

Create a silhouette plot

r_to_z()[source]

Apply Fisher’s r to z transformation to each element of the data object.

regress(X, mode='ols', **kwargs)[source]

Run a regression on an adjacency instance. You can decompose an adjacency instance with another adjacency instance. You can also decompose each pixel by passing a design_matrix instance.

Parameters:
  • X – Design matrix can be an Adjacency or Design_Matrix instance

  • method – type of regression (default: ols)

Returns:

(dict) dictionary of stats outputs.

Return type:

stats

shape()[source]

Calculate shape of data.

similarity(data, plot=False, perm_type='2d', n_permute=5000, metric='spearman', ignore_diagonal=False, **kwargs)[source]

Calculate similarity between two Adjacency matrices. Default is to use spearman correlation and permutation test.

Parameters:
  • data (Adjacency or array) – Adjacency data, or 1-d array same size as self.data

  • perm_type – (str) ‘1d’,’2d’, or None

  • metric – (str) ‘spearman’,’pearson’,’kendall’

  • ignore_diagonal – (bool) only applies to ‘directed’ Adjacency types using

  • perm_type='1d' (perm_type=None or) –

social_relations_model(summarize_results=True, nan_replace=True)[source]

Estimate the social relations model from a matrix for a round-robin design.

X_{ij} = m + lpha_i + eta_j + g_{ij} + epsilon_{ijl}

where X_{ij} is the score for person i rating person j, m is the group mean, lpha_i is person i’s actor effect, eta_j is person j’s partner effect, g_{ij} is the relationship effect and epsilon_{ijl} is the error in measure l for actor i and partner j.

This model is primarily concerned with partioning the variance of the various effects.

Code is based on implementation presented in Chapter 8 of Kenny, Kashy, & Cook (2006). Tests replicate examples presented in the book. Note, that this method assumes that actor scores are rows (lower triangle), while partner scores are columnns (upper triangle). The minimal sample size to estimate these effects is 4.

Model Assumptions:
  • Social interactions are exclusively dyadic

  • People are randomly sampled from population

  • No order effects

  • The effects combine additively and relationships are linear

In the future we might update the formulas and standard errors based on Bond and Lashley, 1996

Parameters:
  • self – (adjacency) can be a single matrix or many matrices for each group

  • summarize_results – (bool) will provide a formatted summary of model results

  • nan_replace – (bool) will replace nan values with row and column means

Returns:

(pd.Series/pd.DataFrame) All of the effects estimated using SRM

Return type:

estimated effects

square_shape()[source]

Calculate shape of squareform data.

squareform()[source]

Convert adjacency back to squareform

stats_label_distance(labels=None, n_permute=5000, n_jobs=-1)[source]

Calculate permutation tests on within and between label distance.

Parameters:
  • labels (np.array) – numpy array of labels to plot

  • n_permute (int) – number of permutations to run (default=5000)

Returns:

dictionary of within and between group differences

and p-values

Return type:

dict

std(axis=0)[source]

Calculate standard deviation of Adjacency

Parameters:

axis – (int) calculate std over features (0) or data (1). For data it will be on upper triangle.

Returns:

float if single, adjacency if axis=0, np.array if axis=1 and

multiple

Return type:

std

sum(axis=0)[source]

Calculate sum of Adjacency

Parameters:

axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.

Returns:

float if single, adjacency if axis=0, np.array if axis=1

and multiple

Return type:

mean

threshold(upper=None, lower=None, binarize=False)[source]
Threshold Adjacency instance. Provide upper and lower values or

percentages to perform two-sided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every non-zero value.

Parameters:
  • upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

  • lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

  • binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every non-zero value; default False

Returns:

thresholded Adjacency instance

Return type:

Adjacency

to_graph()[source]

Convert Adjacency into networkx graph. only works on single_matrix for now.

ttest(permutation=False, **kwargs)[source]

Calculate ttest across samples.

Parameters:

permutation – (bool) Run ttest as permutation. Note this can be very slow.

Returns:

(dict) contains Adjacency instances of t values (or mean if

running permutation) and Adjacency instance of p values.

Return type:

out

write(file_name, method='long')[source]

Write out Adjacency object to csv file.

Parameters:
  • file_name (str) – name of file name to write

  • method (str) – method to write out data [‘long’,’square’]

z_to_r()[source]

Convert z score back into r value for each element of data object

class nltools.data.Groupby(data, mask)[source]
apply(method)[source]

Apply Brain_Data instance methods to each element of Groupby object.

combine(value_dict)[source]

Combine value dictionary back into masks

split(data, mask)[source]

Split Brain_Data instance into separate masks and store as a dictionary.

class nltools.data.Design_Matrix(*args, **kwargs)[source]

Design_Matrix is a class to represent design matrices with special methods for data processing (e.g. convolution, upsampling, downsampling) and also intelligent and flexible and intelligent appending (e.g. auto-matically keep certain columns or polynomial terms separated during concatentation). It plays nicely with Brain_Data and can be used to build an experimental design to pass to Brain_Data’s X attribute. It is essentially an enhanced pandas df, with extra attributes and methods. Methods always return a new design matrix instance (copy). Column names are always string types.

Parameters:
  • sampling_freq (float) – sampling rate of each row in hertz; To covert seconds to hertz (e.g. in the case of TRs for neuroimaging) using hertz = 1 / TR

  • convolved (list, optional) – on what columns convolution has been performed; defaults to None

  • polys (list, optional) – list of polynomial terms in design matrix, e.g. intercept, polynomial trends, basis functions, etc; default None

add_dct_basis(duration=180, drop=0)[source]

Adds unit scaled cosine basis functions to Design_Matrix columns, based on spm-style discrete cosine transform for use in high-pass filtering. Does not add intercept/constant. Care is recommended if using this along with .add_poly(), as some columns will be highly-correlated.

Parameters:
  • duration (int) – length of filter in seconds

  • drop (int) – index of which early/slow bases to drop if any; will always drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2); default None

add_poly(order=0, include_lower=True)[source]

Add nth order Legendre polynomial terms as columns to design matrix. Good for adding constant/intercept to model (order = 0) and accounting for slow-frequency nuisance artifacts e.g. linear, quadratic, etc drifts. Care is recommended when using this with .add_dct_basis() as some columns will be highly correlated.

Parameters:
  • order (int) – what order terms to add; 0 = constant/intercept (default), 1 = linear, 2 = quadratic, etc

  • include_lower – (bool) whether to add lower order terms if order > 0

append(dm, axis=0, keep_separate=True, unique_cols=None, fill_na=0, verbose=False)[source]

Method for concatenating another design matrix row or column-wise. When concatenating row-wise, has the ability to keep certain columns separated if they exist in multiple design matrices (e.g. keeping separate intercepts for multiple runs). This is on by default and will automatically separate out polynomial columns (i.e. anything added with the add_poly or add_dct_basis methods). Additional columns can be separate by run using the unique_cols parameter. Can also add new polynomial terms during vertical concatentation (when axis == 0). This will by default create new polynomial terms separately for each design matrix

Parameters:
  • dm (Design_Matrix or list) – design_matrix or list of design_matrices to append

  • axis (int) – 0 for row-wise (vert-cat), 1 for column-wise (horz-cat); default 0

  • keep_separate (bool,optional) – whether try and uniquify columns; defaults to True; only applies when axis==0

  • unique_cols (list,optional) – what additional columns to try to keep separated by uniquifying, only applies when axis = 0; defaults to None

  • fill_na (str/int/float) – if provided will fill NaNs with this value during row-wise appending (when axis = 0) if separate columns are desired; default 0

  • verbose (bool) – print messages during append about how polynomials are going to be separated

clean(fill_na=0, exclude_polys=False, thresh=0.95, verbose=True)[source]

Method to fill NaNs in Design Matrix and remove duplicate columns based on data values, NOT names. Columns are dropped if they are correlated >= the requested threshold (default = .95). In this case, only the first instance of that column will be retained and all others will be dropped.

Parameters:
  • fill_na (str/int/float) – value to fill NaNs with set to None to retain NaNs; default 0

  • exclude_polys (bool) – whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default False

  • thresh (float) – correlation threshold to use to drop redundant columns; default .95

  • verbose (bool) – print what column names were dropped; default True

convolve(conv_func='hrf', columns=None)[source]

Perform convolution using an arbitrary function.

Parameters:
  • conv_func (ndarray or string) – either a 1d numpy array containing output of a function that you want to convolve; a samples by kernel 2d array of several kernels to convolve; or the string ‘hrf’ which defaults to a glover HRF function at the Design_matrix’s sampling_freq

  • columns (list) – what columns to perform convolution on; defaults to all non-polynomial columns

downsample(target, **kwargs)[source]

Downsample columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.

Parameters:
  • target (float) – desired frequency in hz

  • kwargs – additional inputs to nltools.stats.downsample

heatmap(figsize=(8, 6), **kwargs)[source]

Visualize Design Matrix spm style. Use .plot() for typical pandas plotting functionality. Can pass optional keyword args to seaborn heatmap.

replace_data(data, column_names=None)[source]

Convenient method to replace all data in Design_Matrix with new data while keeping attributes and polynomial columns untouched.

Parameters:

columns_names (list) – list of columns names for new data

upsample(target, **kwargs)[source]

Upsample columns of design matrix. Relies on nltools.stats.upsample, but ensures that returned object is a design matrix.

Parameters:
  • target (float) – desired frequence in hz

  • kwargs – additional inputs to nltools.stats.downsample

vif(exclude_polys=True)[source]

Compute variance inflation factor amongst columns of design matrix,ignoring polynomial terms. Much faster that statsmodels and more reliable too. Uses the same method as Matlab and R (diagonal elements of the inverted correlation matrix).

Parameters:

exclude_polys (bool) – whether to skip checking of polynomial terms (i.e intercept, trends, basis functions); default True

Returns:

list with length == number of columns - intercept

Return type:

vifs (list)

zscore(columns=[])[source]

Z-score specific columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.

Parameters:

columns (list) – columns to z-score; defaults to all columns

nltools.analysis: Analysis Tools

class nltools.analysis.Roc(input_values=None, binary_outcome=None, threshold_type='optimal_overall', forced_choice=None, **kwargs)[source]

Roc Class

The Roc class is based on Tor Wager’s Matlab roc_plot.m function and allows a user to easily run different types of receiver operator characteristic curves. For example, one might be interested in single interval or forced choice.

Parameters:
  • input_values – nibabel data instance

  • binary_outcome – vector of training labels

  • threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

calculate(input_values=None, binary_outcome=None, criterion_values=None, threshold_type='optimal_overall', forced_choice=None, balanced_acc=False)[source]

Calculate Receiver Operating Characteristic plot (ROC) for single-interval classification.

Parameters:
  • input_values – nibabel data instance

  • binary_outcome – vector of training labels

  • criterion_values – (optional) criterion values for calculating fpr & tpr

  • threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]

  • forced_choice – index indicating position for each unique subject (default=None)

  • balanced_acc – balanced accuracy for single-interval classification (bool). THIS IS NOT COMPLETELY IMPLEMENTED BECAUSE IT AFFECTS ACCURACY ESTIMATES, BUT NOT P-VALUES OR THRESHOLD AT WHICH TO EVALUATE SENS/SPEC

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

plot(plot_method='gaussian', balanced_acc=False, **kwargs)[source]

Create ROC Plot

Create a specific kind of ROC curve plot, based on input values along a continuous distribution and a binary outcome variable (logical)

Parameters:
  • plot_method – type of plot [‘gaussian’,’observed’]

  • binary_outcome – vector of training labels

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

Returns:

fig

summary()[source]

Display a formatted summary of ROC analysis.

nltools.stats: Stats Tools

NeuroLearn Statistics Tools

Tools to help with statistical analyses.

nltools.stats.align(data, method='deterministic_srm', n_features=None, axis=0, *args, **kwargs)[source]

Align subject data into a common response model.

Can be used to hyperalign source data to target data using Hyperalignment from Dartmouth (i.e., procrustes transformation; see nltools.stats.procrustes) or Shared Response Model from Princeton (see nltools.external.srm). (see nltools.data.Brain_Data.align for aligning a single Brain object to another). Common Model is shared response model or centered target data. Transformed data can be back projected to original data using Tranformation matrix. Inputs must be a list of Brain_Data instances or numpy arrays (observations by features).

Parameters:
  • data – (list) A list of Brain_Data objects

  • method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]

  • n_features – (int) number of features to align to common space. If None then will select number of voxels

  • axis – (int) axis to align on

Returns:

(dict) a dictionary containing a list of transformed subject

matrices, a list of transformation matrices, the shared response matrix, and the intersubject correlation of the shared resposnes

Return type:

out

Examples

  • Hyperalign using procrustes transform:
    >>> out = align(data, method='procrustes')
    
  • Align using shared response model:
    >>> out = align(data, method='probabilistic_srm', n_features=None)
    
  • Project aligned data into original data:
    >>> original_data = [np.dot(t.data,tm.T) for t,tm in zip(out['transformed'], out['transformation_matrix'])]
    
nltools.stats.align_states(reference, target, metric='correlation', return_index=False, replace_zero_variance=False)[source]

Align state weight maps using hungarian algorithm by minimizing pairwise distance between group states.

Parameters:
  • reference – (np.array) reference pattern x state matrix

  • target – (np.array) target pattern x state matrix to align to reference

  • metric – (str) distance metric to use

  • return_index – (bool) return index if True, return remapped data if False

  • replace_zero_variance – (bool) transform a vector with zero variance to random numbers from a uniform distribution. Useful for when using correlation as a distance metric to avoid NaNs.

Returns:

(list) a list of reordered state X pattern matrices

Return type:

ordered_weights

nltools.stats.calc_bpm(beat_interval, sampling_freq)[source]

Calculate instantaneous BPM from beat to beat interval

Parameters:
  • beat_interval – (int) number of samples in between each beat (typically R-R Interval)

  • sampling_freq – (float) sampling frequency in Hz

Returns:

(float) beats per minute for time interval

Return type:

bpm

nltools.stats.correlation(data1, data2, metric='pearson')[source]

This function calculates the correlation between data1 and data2

Parameters:
  • data1 – (np.array) x

  • data2 – (np.array) y

  • metric – (str) type of correlation [“spearman” or “pearson” or “kendall”]

Returns:

(np.array) correlations p: (float) p-value

Return type:

r

nltools.stats.correlation_permutation(data1, data2, method='permute', n_permute=5000, metric='spearman', tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

Compute correlation and calculate p-value using permutation methods.

‘permute’ method randomly shuffles one of the vectors. This method is recommended for independent data. For timeseries data we recommend using ‘circle_shift’ or ‘phase_randomize’ methods.

Parameters:
  • data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute

  • data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute

  • n_permute – (int) number of permutations

  • metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]

  • method – (str) type of permutation [‘permute’, ‘circle_shift’, ‘phase_randomize’]

  • random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_parms – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘correlation’,’p’]

Return type:

stats

nltools.stats.distance_correlation(x, y, bias_corrected=True, ttest=False)[source]

Compute the distance correlation betwen 2 arrays to test for multivariate dependence (linear or non-linear). Arrays must match on their first dimension. It’s almost always preferable to compute the bias_corrected version which can also optionally perform a ttest. This ttest operates on a statistic thats ~dcorr^2 and will be also returned.

Explanation: Distance correlation involves computing the normalized covariance of two centered euclidean distance matrices. Each distance matrix is the euclidean distance between rows (if x or y are 2d) or scalars (if x or y are 1d). Each matrix is centered prior to computing the covariance either using double-centering or u-centering, which corrects for bias as the number of dimensions increases. U-centering is almost always preferred in all cases. It also permits inference of the normalized covariance between each distance matrix using a one-tailed directional t-test. (Szekely & Rizzo, 2013). While distance correlation is normally bounded between 0 and 1, u-centering can produce negative estimates, which are never significant.

Validated against the dcor and dcor.ttest functions in the ‘energy’ R package and the dcor.distance_correlation, dcor.udistance_correlation_sqr, and dcor.independence.distance_correlation_t_test functions in the dcor Python package.

Parameters:
  • x (ndarray) – 1d or 2d numpy array of observations by features

  • y (ndarry) – 1d or 2d numpy array of observations by features

  • bias_corrected (bool) – if false use double-centering which produces a biased-estimate that converges to 1 as the number of dimensions increase. Otherwise used u-centering to correct this bias. Note this must be True if ttest=True; default True

  • ttest (bool) – perform a ttest using the bias_corrected distance correlation; default False

Returns:

dictionary of results (correlation, t, p, and df.) Optionally, covariance, x variance, and y variance

Return type:

results (dict)

nltools.stats.double_center(mat)[source]

Double center a 2d array.

Parameters:

mat (ndarray) – 2d numpy array

Returns:

double-centered version of input

Return type:

mat (ndarray)

nltools.stats.downsample(data, sampling_freq=None, target=None, target_type='samples', method='mean')[source]

Downsample pandas to a new target frequency or number of samples using averaging.

Parameters:
  • data – (pd.DataFrame, pd.Series) data to downsample

  • sampling_freq – (float) Sampling frequency of data in hertz

  • target – (float) downsampling target

  • target_type – type of target can be [samples,seconds,hz]

  • method – (str) type of downsample method [‘mean’,’median’], default: mean

Returns:

(pd.DataFrame, pd.Series) downsmapled data

Return type:

out

nltools.stats.fdr(p, q=0.05)[source]

Determine FDR threshold given a p value array and desired false discovery rate q. Written by Tal Yarkoni

Parameters:
  • p – (np.array) vector of p-values

  • q – (float) false discovery rate level

Returns:

(float) p-value threshold based on independence or positive

dependence

Return type:

fdr_p

nltools.stats.find_spikes(data, global_spike_cutoff=3, diff_spike_cutoff=3)[source]

Function to identify spikes from fMRI Time Series Data

Parameters:
  • data – Brain_Data or nibabel instance

  • global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.

  • diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.

Returns:

pandas dataframe with spikes as indicator variables

nltools.stats.fisher_r_to_z(r)[source]

Use Fisher transformation to convert correlation to z score

nltools.stats.fisher_z_to_r(z)[source]

Use Fisher transformation to convert correlation to z score

nltools.stats.holm_bonf(p, alpha=0.05)[source]

Compute corrected p-values based on the Holm-Bonferroni method, i.e. step-down procedure applying iteratively less correction to highest p-values. A bit more conservative than fdr, but much more powerful thanvanilla bonferroni.

Parameters:
  • p – (np.array) vector of p-values

  • alpha – (float) alpha level

Returns:

(float) p-value threshold based on bonferroni

step-down procedure

Return type:

bonf_p

nltools.stats.isc(data, n_samples=5000, metric='median', method='bootstrap', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

Compute pairwise intersubject correlation from observations by subjects array.

This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.

There are currently three different methods to compute p-values. These include the classic methods for computing permuted time-series by either circle-shifting the data or phase-randomizing the data (see Lancaster et al., 2018). These methods create random surrogate data while preserving the temporal autocorrelation inherent to the signal. By default, we use the subject-wise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method using the same method in Brainiak.

Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018). Surrogate data for hypothesis testing of physical systems. Physics Reports, 748, 1-60.

Parameters:
  • data – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects

  • n_samples – (int) number of random samples/bootstraps

  • metric – (str) type of isc summary metric [‘mean’,’median’]

  • method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_null – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘correlation’,’p’]

Return type:

stats

nltools.stats.isc_group(group1, group2, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

Compute difference in intersubject correlation between groups.

This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.

There are currently two different methods to compute p-values. By default, we use the subject-wise permutation method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both within and between the groups. Then the group labels are permuted and the mean difference between the two groups are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group and the ISC difference between these groups is used to generate a null distribution. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method (Hall & Wilson, 1991).

Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

Parameters:
  • group1 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects

  • group2 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects

  • n_samples – (int) number of samples for permutation or bootstrapping

  • metric – (str) type of isc summary metric [‘mean’,’median’]

  • method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_null – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘correlation’,’p’]

Return type:

stats

nltools.stats.isfc(data, method='average')[source]

Compute intersubject functional connectivity (ISFC) from a list of observation x feature matrices

This function uses the leave one out approach to compute ISFC (Simony et al., 2016). For each subject, compute the cross-correlation between each voxel/roi with the average of the rest of the subjects data. In other words, compute the mean voxel/ROI response for all participants except the target subject. Then compute the correlation between each ROI within the target subject with the mean ROI response in the group average.

Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature communications, 7, 12141.

Parameters:
  • data – list of subject matrices (observations x voxels/rois)

  • method – approach to computing ISFC. ‘average’ uses leave one

Returns:

list of subject ISFC matrices

nltools.stats.isps(data, sampling_freq=0.5, low_cut=0.04, high_cut=0.07, order=5, pairwise=False)[source]

Compute Dynamic Intersubject Phase Synchrony (ISPS from a observation by subject array)

This function computes the instantaneous intersubject phase synchrony for a single voxel/roi timeseries. Requires multiple subjects. This method is largely based on that described by Glerean et al., 2012 and performs a hilbert transform on narrow bandpass filtered timeseries (butterworth) data to get the instantaneous phase angle. The function returns a dictionary containing the average phase angle, the average vector length, and parametric p-values computed using the rayleigh test using circular statistics (Fisher, 1993). If pairwise=True, then it will compute these on the pairwise phase angle differences, if pairwise=False, it will compute these on the actual phase angles. This is called inter-site phase coupling or inter-trial phase coupling respectively in the EEG literatures.

This function requires narrow band filtering your data. As a default we use the recommendations by (Glerean et al., 2012) of .04-.07Hz. This is similar to the “slow-4” band (0.025–0.067 Hz) described by (Zuo et al., 2010; Penttonen & Buzsáki, 2003), but excludes the .03 band, which has been demonstrated to contain aliased respiration signals (Birn, 2006).

Birn RM, Smith MA, Bandettini PA, Diamond JB. 2006. Separating respiratory-variation-related fluctuations from neuronal-activity- related fluctuations in fMRI. Neuroimage 31:1536–1548.

Buzsáki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science, 304(5679), 1926-1929.

Fisher, N. I. (1995). Statistical analysis of circular data. cambridge university press.

Glerean, E., Salmi, J., Lahnakoski, J. M., Jääskeläinen, I. P., & Sams, M. (2012). Functional magnetic resonance imaging phase synchronization as a measure of dynamic functional connectivity. Brain connectivity, 2(2), 91-101.

Parameters:
  • data – (pd.DataFrame, np.ndarray) observations x subjects data

  • sampling_freq – (float) sampling freqency of data in Hz

  • low_cut – (float) lower bound cutoff for high pass filter

  • high_cut – (float) upper bound cutoff for low pass filter

  • order – (int) filter order for butterworth bandpass

  • pairwise – (bool) compute phase angle coherence on pairwise phase angle differences or on raw phase angle.

Returns:

dictionary with mean phase angle, vector length, and rayleigh statistic

nltools.stats.make_cosine_basis(nsamples, sampling_freq, filter_length, unit_scale=True, drop=0)[source]
Create a series of cosine basis functions for a discrete cosine

transform. Based off of implementation in spm_filter and spm_dctmtx because scipy dct can only apply transforms but not return the basis functions. Like SPM, does not add constant (i.e. intercept), but does retain first basis (i.e. sigmoidal/linear drift)

Parameters:
  • nsamples (int) – number of observations (e.g. TRs)

  • sampling_freq (float) – sampling frequency in hertz (i.e. 1 / TR)

  • filter_length (int) – length of filter in seconds

  • unit_scale (true) – assure that the basis functions are on the normalized range [-1, 1]; default True

  • drop (int) – index of which early/slow bases to drop if any; default is to drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2)

Returns:

nsamples x number of basis sets numpy array

Return type:

out (ndarray)

nltools.stats.matrix_permutation(data1, data2, n_permute=5000, metric='spearman', how='upper', include_diag=False, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

Permute 2-dimensional matrix correlation (mantel test).

Chen, G. et al. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. Neuroimage, 142, 248-259.

Parameters:
  • data1 – (pd.DataFrame, np.array) square matrix

  • data2 – (pd.DataFrame, np.array) square matrix

  • n_permute – (int) number of permutations

  • metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]

  • how – (str) whether to use the ‘upper’ (default), ‘lower’, or ‘full’ matrix. The default of ‘upper’ assumes both matrices are symmetric

  • include_diag (bool) – only applies if how=’full’. Whether to include the diagonal elements in the comparison

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_parms – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘correlation’,’p’]

Return type:

stats

nltools.stats.multi_threshold(t_map, p_map, thresh)[source]

Threshold test image by multiple p-value from p image

Parameters:
  • stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)

  • p – (Brain_Data) Brain_data instance of p-values

  • threshold – (list) list of p-values to threshold stat image

Returns:

Thresholded Brain_Data instance

Return type:

out

nltools.stats.one_sample_permutation(data, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

One sample permutation test using randomization.

Parameters:
  • data – (pd.DataFrame, pd.Series, np.array) data to permute

  • n_permute – (int) number of permutations

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_parms – (bool) Return the permutation distribution along with the p-value; default False

  • random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)

Returns:

(dict) dictionary of permutation results [‘mean’,’p’]

Return type:

stats

nltools.stats.pearson(x, y)[source]

Correlates row vector x with each row vector in 2D array y. From neurosynth.stats.py - author: Tal Yarkoni

nltools.stats.procrustes(data1, data2)[source]

Procrustes analysis, a similarity test for two data sets.

Each input matrix is a set of points or vectors (the rows of the matrix). The dimension of the space is the number of columns of each matrix. Given two identically sized matrices, procrustes standardizes both such that: - \(tr(AA^{T}) = 1\). - Both sets of points are centered around the origin. Procrustes ([1]_, [2]_) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize \(M^{2}=\sum(data1-data2)^{2}\), or the sum of the squares of the pointwise differences between the two input datasets. This function was not designed to handle datasets with different numbers of datapoints (rows). If two data sets have different dimensionality (different number of columns), this function will add columns of zeros to the smaller of the two.

Parameters:
  • data1 – array_like Matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 (must have >1 unique points).

  • data2 – array_like n rows of data in k space to be fit to data1. Must be the same shape (numrows, numcols) as data1 (must have >1 unique points).

Returns:

array_like

A standardized version of data1.

mtx2array_like

The orientation of data2 that best fits data1. Centered, but not necessarily \(tr(AA^{T}) = 1\).

disparityfloat

\(M^{2}\) as defined above.

R(N, N) ndarray

The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of dot(data1, R) - data2, subject to dot(R.T, R) == I.

scalefloat

Sum of the singular values of dot(data1.T, data2).

Return type:

mtx1

nltools.stats.procrustes_distance(mat1, mat2, n_permute=5000, tail=2, n_jobs=-1, random_state=None)[source]

Use procrustes super-position to perform a similarity test between 2 matrices. Matrices need to match in size on their first dimension only, as the smaller matrix on the second dimension will be padded with zeros. After aligning two matrices using the procrustes transformation, use the computed disparity between them (sum of squared error of elements) as a similarity metric. Shuffle the rows of one of the matrices and recompute the disparity to perform inference (Peres-Neto & Jackson, 2001).

Parameters:
  • mat1 (ndarray) – 2d numpy array; must have same number of rows as mat2

  • mat2 (ndarray) – 1d or 2d numpy array; must have same number of rows as mat1

  • n_permute (int) – number of permutation iterations to perform

  • tail (int) – either 1 for one-tailed or 2 for two-tailed test; default 2

  • n_jobs (int) – The number of CPUs to use to do permutation; default -1 (all)

Returns:

similarity between matrices bounded between 0 and 1 pval (float): permuted p-value

Return type:

similarity (float)

nltools.stats.regress(X, Y, mode='ols', stats='full', **kwargs)[source]

This is a flexible function to run several types of regression models provided X and Y numpy arrays. Y can be a 1d numpy array or 2d numpy array. In the latter case, results will be output with shape 1 x Y.shape[1], in other words fitting a separate regression model to each column of Y.

Does NOT add an intercept automatically to the X matrix before fitting like some other software packages. This is left up to the user.

This function can compute regression in 3 ways:

  1. Standard OLS

  2. OLS with robust sandwich estimators for standard errors. 3 robust types of estimators exist:

  • ‘hc0’ - classic huber-white estimator robust to heteroscedasticity (default)

  • ‘hc3’ - a variant on huber-white estimator slightly more conservative when sample sizes are small

  • ‘hac’ - an estimator robust to both heteroscedasticity and auto-correlation; auto-correlation lag can be controlled with the nlags keyword argument; default is 1

  1. ARMA (auto-regressive moving-average) model (experimental). This model is fit through statsmodels.tsa.arima_model.ARMA, so more information about options can be found there. Any settings can be passed in as kwargs. By default fits a (1,1) model with starting lags of 2. This mode is computationally intensive and can take quite a while if Y has many columns. If Y is a 2d array joblib.Parallel is used for faster fitting by parallelizing fits across columns of Y. Parallelization can be controlled by passing in kwargs. Defaults to multi-threading using 10 separate threads, as threads don’t require large arrays to be duplicated in memory. Defaults are also set to enable memory-mapping for very large arrays if backend=’multiprocessing’ to prevent crashes and hangs. Various levels of progress can be monitored using the ‘disp’ (statsmodels) and ‘verbose’ (joblib) keyword arguments with integer values > 0.

Parameters:
  • X (ndarray) – design matrix; assumes intercept is included

  • Y (ndarray) – dependent variable array; if 2d, a model is fit to each column of Y separately

  • mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or

  • 'arma'

  • stats (str) – one of ‘full’, ‘betas’, ‘tstats’. Useful to speed up calculation if

  • 'full'. (you know you only need some statistics and not others. Defaults to) –

  • robust_estimator (str,optional) – kind of robust estimator to use if mode = ‘robust’; default ‘hc0’

  • nlags (int,optional) – auto-correlation lag correction if mode = ‘robust’ and robust_estimator = ‘hac’; default 1

  • order (tuple,optional) – auto-regressive and moving-average orders for mode = ‘arma’; default (1,1)

  • kwargs (dict) – additional keyword arguments to statsmodels.tsa.arima_model.ARMA and joblib.Parallel

Returns:

coefficients se: standard error of coefficients t: t-statistics (coef/sterr) p : p-values df: degrees of freedom res: residuals

Return type:

b

Examples

Standard OLS

>>> results = regress(X,Y,mode='ols')

Robust OLS with heteroscedasticity (hc0) robust standard errors

>>> results = regress(X,Y,mode='robust')

Robust OLS with heteroscedasticty and auto-correlation (with lag 2) robust standard errors

>>> results = regress(X,Y,mode='robust',robust_estimator='hac',nlags=2)

Auto-regressive mode with auto-regressive and moving-average lags = 1

>>> results = regress(X,Y,mode='arma',order=(1,1))

Auto-regressive model with auto-regressive lag = 2, moving-average lag = 3, and multi-processing instead of multi-threading using 8 cores (this can use a lot of memory if input arrays are very large!).

>>> results = regress(X,Y,mode='arma',order=(2,3),backend='multiprocessing',n_jobs=8)
nltools.stats.summarize_bootstrap(data, save_weights=False)[source]

Calculate summary of bootstrap samples

Parameters:
  • sample – (Brain_Data) Brain_Data instance of samples

  • save_weights – (bool) save bootstrap weights

Returns:

(dict) dictionary of Brain_Data summary images

Return type:

output

nltools.stats.threshold(stat, p, thr=0.05, return_mask=False)[source]

Threshold test image by p-value from p image

Parameters:
  • stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)

  • p – (Brain_Data) Brain_data instance of p-values

  • threshold – (float) p-value to threshold stat image

  • return_mask – (bool) optionall return the thresholding mask; default False

Returns:

Thresholded Brain_Data instance

Return type:

out

nltools.stats.transform_pairwise(X, y)[source]

Transforms data into pairs with balanced labels for ranking Transforms a n-class ranking problem into a two-class classification problem. Subclasses implementing particular strategies for choosing pairs should override this method. In this method, all pairs are choosen, except for those that have the same target value. The output is an array of balanced classes, i.e. there are the same number of -1 as +1

Reference: “Large Margin Rank Boundaries for Ordinal Regression”, R. Herbrich, T. Graepel, K. Obermayer. Authors: Fabian Pedregosa <fabian@fseoane.net> Alexandre Gramfort <alexandre.gramfort@inria.fr>

Parameters:
  • X – (np.array), shape (n_samples, n_features) The data

  • y – (np.array), shape (n_samples,) or (n_samples, 2) Target labels. If it’s a 2D array, the second column represents the grouping of samples, i.e., samples with different groups will not be considered.

Returns:

(np.array), shape (k, n_feaures)

Data as pairs, where k = n_samples * (n_samples-1)) / 2 if grouping values were not passed. If grouping variables exist, then returns values computed for each group.

y_trans: (np.array), shape (k,)

Output class labels, where classes have values {-1, +1} If y was shape (n_samples, 2), then returns (k, 2) with groups on the second dimension.

Return type:

X_trans

nltools.stats.trim(data, cutoff=None)[source]

Trim a Pandas DataFrame or Series by replacing outlier values with NaNs

Parameters:
  • data – (pd.DataFrame, pd.Series) data to trim

  • cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}

Returns:

(pd.DataFrame, pd.Series) trimmed data

Return type:

out

nltools.stats.two_sample_permutation(data1, data2, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

Independent sample permutation test.

Parameters:
  • data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute

  • data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute

  • n_permute – (int) number of permutations

  • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

  • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

  • return_parms – (bool) Return the permutation distribution along with the p-value; default False

Returns:

(dict) dictionary of permutation results [‘mean’,’p’]

Return type:

stats

nltools.stats.u_center(mat)[source]

U-center a 2d array. U-centering is a bias-corrected form of double-centering

Parameters:

mat (ndarray) – 2d numpy array

Returns:

u-centered version of input

Return type:

mat (narray)

nltools.stats.upsample(data, sampling_freq=None, target=None, target_type='samples', method='linear')[source]

Upsample pandas to a new target frequency or number of samples using interpolation.

Parameters:
  • data – (pd.DataFrame, pd.Series) data to upsample (Note: will drop non-numeric columns from DataFrame)

  • sampling_freq – Sampling frequency of data in hertz

  • target – (float) upsampling target

  • target_type – (str) type of target can be [samples,seconds,hz]

  • method – (str) [‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’] where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order (default: linear)

Returns:

upsampled pandas object

nltools.stats.winsorize(data, cutoff=None, replace_with_cutoff=True)[source]

Winsorize a Pandas DataFrame or Series with the largest/lowest value not considered outlier

Parameters:
  • data – (pd.DataFrame, pd.Series) data to winsorize

  • cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}

  • replace_with_cutoff – (bool) If True, replace outliers with cutoff. If False, replaces outliers with closest existing values; (default: False)

Returns:

(pd.DataFrame, pd.Series) winsorized data

Return type:

out

nltools.stats.zscore(df)[source]

zscore every column in a pandas dataframe or series.

Parameters:

df – (pd.DataFrame) Pandas DataFrame instance

Returns:

(pd.DataFrame) z-scored pandas DataFrame or series instance

Return type:

z_data

nltools.datasets: Dataset Tools

NeuroLearn datasets

functions to help download datasets

nltools.datasets.download_collection(collection=None, data_dir=None, overwrite=False, resume=True, verbose=1)[source]

Download images and metadata from Neurovault collection

Parameters:
  • collection (int, optional) – collection id. Defaults to None.

  • data_dir (str, optional) – data directory. Defaults to None.

  • overwrite (bool, optional) – overwrite data directory. Defaults to False.

  • resume (bool, optional) – resume download. Defaults to True.

  • verbose (int, optional) – print diagnostic messages. Defaults to 1.

Returns:

(DataFrame of image metadata, list of files from downloaded collection)

Return type:

(pd.DataFrame, list)

nltools.datasets.download_nifti(url, data_dir=None)[source]

Download a image to a nifti file.

nltools.datasets.fetch_emotion_ratings(data_dir=None, resume=True, verbose=1)[source]

Download and loads emotion rating dataset from neurovault

Parameters:

data_dir – (string, optional). Path of the data directory. Used to force data storage in a specified location. Default: None

Returns:

(Brain_Data) Brain_Data object with downloaded data. X=metadata

Return type:

out

nltools.datasets.fetch_pain(data_dir=None, resume=True, verbose=1)[source]

Download and loads pain dataset from neurovault

Parameters:

data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location. Default: None

Returns:

(Brain_Data) Brain_Data object with downloaded data. X=metadata

Return type:

out

nltools.datasets.get_collection_image_metadata(collection=None, data_dir=None, limit=10)[source]

Get image metadata associated with collection

Parameters:
  • collection (int, optional) – collection id. Defaults to None.

  • data_dir (str, optional) – data directory. Defaults to None.

  • limit (int, optional) – number of images to increment. Defaults to 10.

Returns:

Dataframe with full image metadata from collection

Return type:

pd.DataFrame

nltools.cross_validation: Cross-Validation Tools

Cross-Validation Data Classes

Scikit-learn compatible classes for performing various types of cross-validation

class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]

K-Folds cross validation iterator which stratifies continuous data (unlike scikit-learn equivalent).

Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k - 1 remaining folds form the training set. Extension of KFold from scikit-learn cross_validation model

Parameters:
  • n_splits – int, default=3 Number of folds. Must be at least 2.

  • shuffle – boolean, optional Whether to shuffle the data before splitting into batches.

  • random_state – None, int or RandomState Pseudo-random number generator state used for random sampling. If None, use default numpy RNG for shuffling

split(X, y, groups=None)[source]

Generate indices to split data into training and test set.

Parameters:
  • X – array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

  • y – array-like, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.

  • groups – (object) Always ignored, exists for compatibility.

Returns:

(ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.

Return type:

train

nltools.cross_validation.set_cv(Y=None, cv_dict=None, return_generator=True)[source]

Helper function to create a sci-kit learn compatible cv object using common parameters for prediction analyses.

Parameters:
  • Y – (pd.DataFrame) Pandas Dataframe of Y labels

  • cv_dict – (dict) Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout}

  • return_generator (bool) – return a cv generator instead of an instance; default True

Returns:

a scikit-learn model-selection generator

Return type:

cv

class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]

K-Folds cross validation iterator which stratifies continuous data (unlike scikit-learn equivalent).

Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k - 1 remaining folds form the training set. Extension of KFold from scikit-learn cross_validation model

Parameters:
  • n_splits – int, default=3 Number of folds. Must be at least 2.

  • shuffle – boolean, optional Whether to shuffle the data before splitting into batches.

  • random_state – None, int or RandomState Pseudo-random number generator state used for random sampling. If None, use default numpy RNG for shuffling

split(X, y, groups=None)[source]

Generate indices to split data into training and test set.

Parameters:
  • X – array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

  • y – array-like, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.

  • groups – (object) Always ignored, exists for compatibility.

Returns:

(ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.

Return type:

train

nltools.mask: Mask Tools

NeuroLearn Mask Classes

Classes to represent masks

nltools.mask.collapse_mask(mask, auto_label=True, custom_mask=None)[source]
collapse separate masks into one mask with multiple integers

overlapping areas are ignored

Parameters:
  • mask – nibabel or Brain_Data instance

  • custom_mask – nibabel instance or string to file path; optional

Returns:

Brain_Data instance of a mask with different integers indicating

different masks

Return type:

out

nltools.mask.create_sphere(coordinates, radius=5, mask=None)[source]

Generate a set of spheres in the brain mask space

Parameters:
  • radius – vector of radius. Will create multiple spheres if len(radius) > 1

  • centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

nltools.mask.expand_mask(mask, custom_mask=None)[source]

expand a mask with multiple integers into separate binary masks

Parameters:
  • mask – nibabel or Brain_Data instance

  • custom_mask – nibabel instance or string to file path; optional

Returns:

Brain_Data instance of multiple binary masks

Return type:

out

nltools.mask.roi_to_brain(data, mask_x)[source]

This function will create convert an expanded binary mask of ROIs (see expand_mask) based on a vector of of values. The dataframe of values must correspond to ROI numbers.

This is useful for populating a parcellation scheme by a vector of Values

Parameters:
  • data – Pandas series, dataframe, list, np.array of ROI by observation

  • mask_x – an expanded binary mask

Returns:

(Brain_Data) Brain_Data instance where each ROI is now populated

with a value

Return type:

out

nltools.file_reader: File Reading

NeuroLearn File Reading Tools

nltools.file_reader.onsets_to_dm(F, sampling_freq, run_length, header='infer', sort=False, keep_separate=True, add_poly=None, unique_cols=None, fill_na=None, **kwargs)[source]

This function can assist in reading in one or several in a 2-3 column onsets files, specified in seconds and converting it to a Design Matrix organized as samples X Stimulus Classes. sampling_freq should be specified in hertz; for TRs use hertz = 1/TR. Onsets files must be organized with columns in one of the following 4 formats:

  1. ‘Stim, Onset’

  2. ‘Onset, Stim’

  3. ‘Stim, Onset, Duration’

  4. ‘Onset, Duration, Stim’

No other file organizations are currently supported. Note: Stimulus offsets (onset + duration) that fall into an adjacent TR include that full TR. E.g. offset of 10.16s with TR = 2 has an offset of TR 5, which spans 10-12s, rather than an offset of TR 4, which spans 8-10s.

Parameters:
  • F (str/Path/pd.DataFrame) – filepath or pandas dataframe

  • sampling_freq (float) – samping frequency in hertz, i.e 1 / TR

  • run_length (int) – run length in number of TRs

  • header (str/None, optional) – whether there’s an additional header row in the

  • "infer". (supplied file/dataframe. See pd.read_csv for more details. Defaults to) –

  • sort (bool, optional) – whether to sort dataframe columns alphabetically. Defaults to False.

  • keep_separate (bool, optional) – if a list of files or dataframes is supplied,

  • True. (whether to create separate polynomial columns per file. Defaults to) –

  • add_poly (bool/int, optional) – whether to add Nth order polynomials to design

  • None. (matrix. Defaults to) –

  • unique_cols (list/None, optional) – if a list of files or dataframes is supplied,

  • file (what additional columns to keep separate per) –

  • fill_na (Any, optional) – what to replace NaNs with. Defaults to None (no filling).

Returns:

design matrix organized as TRs x Stims

Return type:

nltools.data.Design_Matrix

nltools.utils: Utilities

NeuroLearn Utilities

handy utilities.

nltools.utils.concatenate(data)[source]

Concatenate a list of Brain_Data() or Adjacency() objects

nltools.utils.get_anatomical()[source]

Get nltools default anatomical image. DEPRECATED. See MNI_Template and resolve_mni_path from nltools.prefs

nltools.utils.get_resource_path()[source]

Get path to nltools resource directory.

nltools.utils.set_algorithm(algorithm, *args, **kwargs)[source]

Setup the algorithm to use in subsequent prediction analyses.

Parameters:
  • algorithm – The prediction algorithm to use. Either a string or an (uninitialized) scikit-learn prediction object. If string, must be one of ‘svm’,’svr’, linear’,’logistic’,’lasso’, ‘lassopcr’,’lassoCV’,’ridge’,’ridgeCV’,’ridgeClassifier’, ‘randomforest’, or ‘randomforestClassifier’

  • kwargs – Additional keyword arguments to pass onto the scikit-learn clustering object.

Returns:

dictionary of settings for prediction

Return type:

predictor_settings

nltools.utils.set_decomposition_algorithm(algorithm, n_components=None, *args, **kwargs)[source]

Setup the algorithm to use in subsequent decomposition analyses.

Parameters:
  • algorithm – The decomposition algorithm to use. Either a string or an (uninitialized) scikit-learn decomposition object. If string must be one of ‘pca’,’nnmf’, ica’,’fa’, ‘dictionary’, ‘kernelpca’.

  • kwargs – Additional keyword arguments to pass onto the scikit-learn clustering object.

Returns:

dictionary of settings for prediction

Return type:

predictor_settings

nltools.prefs: Preferences

This module can be used to adjust the default MNI template settings that are used internally by all Brain_Data operations. By default all operations are performed in MNI152 2mm space. Thus any files loaded with be resampled to this space by default.You can control this on a per-file loading basis using the mask argument of Brain_Data, e.g.

from nltools.data import Brain_Data

# my_brain will be resampled to 2mm
brain = Brain_Data('my_brain.nii.gz')

# my_brain will now be resampled to the same space as my_mask
brain = Brain_Data('my_brain.nii.gz', mask='my_mask.nii.gz') # will be resampled

Alternatively this module can be used to switch between 2mm or 3mm MNI spaces with and without ventricles:

from nltools.prefs import MNI_Template, resolve_mni_path
from nltools.data import Brain_Data

# Update the resolution globally
MNI_Template['resolution'] = '3mm'

# This works too:
MNI_Template.resolution = 3

# my_brain will be resampled to 3mm and future operation will be in 3mm space
brain = Brain_Data('my_brain.nii.gz')

# get the template nifti files
resolve_mni_path(MNI_Template)

# will print like:
{
    'resolution': '3mm',
    'mask_type': 'with_ventricles',
    'mask': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain_mask.nii.gz',
    'plot': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm.nii.gz',
    'brain':
    '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain.nii.gz'
}
nltools.prefs.resolve_mni_path(MNI_Template)[source]

Helper function to resolve MNI path based on MNI_Template prefs setting.

nltools.plotting: Plotting Tools

NeuroLearn Plotting Tools

Numerous functions to plot data

nltools.plotting.dist_from_hyperplane_plot(stats_output)[source]

Plot SVM Classification Distance from Hyperplane

Parameters:

stats_output – a pandas file with prediction output

Returns:

Will return a seaborn plot of distance from hyperplane

Return type:

fig

nltools.plotting.plot_between_label_distance(distance, labels, ax=None, permutation_test=True, n_permute=5000, fontsize=18, **kwargs)[source]

Create a heatmap indicating average between label distance

Parameters:
  • distance – (pandas dataframe) brain_distance matrix

  • labels – (pandas dataframe) group labels

  • ax – axis to plot (default=None)

  • permutation_test – (boolean)

  • n_permute – (int) number of samples for permuation test

  • fontsize – (int) size of font for plot

Returns:

heatmap out: pandas dataframe of pairwise distance between conditions within_dist_out: average pairwise distance matrix mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions

Return type:

f

nltools.plotting.plot_brain(objIn, how='full', thr_upper=None, thr_lower=None, save=False, **kwargs)[source]

More complete brain plotting of a Brain_Data instance

Parameters:
  • obj (Brain_Data) – object to plot

  • how (str) – whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’

  • thr_upper (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()

  • thr_lower (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()

  • save (str) – if a string file name or path is provided plots will be saved into this directory appended with the orientation they belong to

  • kwargs – optionals args to nilearn plot functions (e.g. vmax)

nltools.plotting.plot_interactive_brain(brain, threshold=1e-06, surface=False, percentile_threshold=False, anatomical=None, **kwargs)[source]

This function leverages nilearn’s new javascript based brain viewer functions to create interactive plotting functionality.

Parameters:
  • brain (nltools.Brain_Data) – a Brain_Data instance of 1d or 2d shape (i.e. 3d or 4d volume)

  • threshold (float/str) – threshold to initialize the visualization, maybe be a percentile string; default 0

  • surface (bool) – whether to create a surface-based plot; default False

  • percentile_threshold (bool) – whether to interpret threshold values as percentiles

  • kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf

Returns:

interactive brain viewer widget

nltools.plotting.plot_mean_label_distance(distance, labels, ax=None, permutation_test=False, n_permute=5000, fontsize=18, **kwargs)[source]

Create a violin plot indicating within and between label distance.

Parameters:
  • distance – pandas dataframe of distance

  • labels – labels indicating columns and rows to group

  • ax – matplotlib axis to plot on

  • permutation_test – (bool) indicates whether to run permuatation test or not

  • n_permute – (int) number of permutations to run

  • fontsize – (int) fontsize for plot labels

Returns:

heatmap stats: (optional if permutation_test=True) permutation results

Return type:

f

nltools.plotting.plot_silhouette(distance, labels, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]

Create a silhouette plot indicating between relative to within label distance

Parameters:
  • distance – (pandas dataframe) brain_distance matrix

  • labels – (pandas dataframe) group labels

  • ax – axis to plot (default=None)

  • permutation_test – (boolean)

  • n_permute – (int) number of samples for permuation test

Optional keyword args:

figsize: (list) dimensions of silhouette plot colors: (list) color triplets for silhouettes. Length must equal number of unique labels

Returns:

heatmap # out: pandas dataframe of pairwise distance between conditions # within_dist_out: average pairwise distance matrix # mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions # p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions

Return type:

# f

nltools.plotting.plot_stacked_adjacency(adjacency1, adjacency2, normalize=True, **kwargs)[source]

Create stacked adjacency to illustrate similarity.

Parameters:
  • matrix1 – Adjacency instance 1

  • matrix2 – Adjacency instance 2

  • normalize – (boolean) Normalize matrices.

Returns:

matplotlib figure

nltools.plotting.plot_t_brain(objIn, how='full', thr='unc', alpha=None, nperm=None, cut_coords=[], **kwargs)[source]

Takes a brain data object and computes a 1 sample t-test across it’s first axis. If a list is provided will compute difference between brain data objects in list (i.e. paired samples t-test). :param objIn: if list will compute difference map first :type objIn: list/Brain_Data :param how: whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’ :type how: list :param thr: what method to use for multiple comparisons correction unc, fdr, or tfce :type thr: str :param alpha: p-value threshold :type alpha: float :param nperm: number of permutations for tcfe; default 1000 :type nperm: int :param cut_coords: x,y,z coords to plot brain slice :type cut_coords: list :param kwargs: optionals args to nilearn plot functions (e.g. vmax)

nltools.plotting.probability_plot(stats_output)[source]

Plot Classification Probability

Parameters:

stats_output – a pandas file with prediction output

Returns:

Will return a seaborn scatterplot

Return type:

fig

nltools.plotting.roc_plot(fpr, tpr)[source]

Plot 1-Specificity by Sensitivity

Parameters:
  • fpr – false positive rate from Roc.calculate

  • tpr – true positive rate from Roc.calculate

Returns:

Will return a matplotlib ROC plot

Return type:

fig

nltools.plotting.scatterplot(stats_output)[source]

Plot Prediction Scatterplot

Parameters:

stats_output – a pandas file with prediction output

Returns:

Will return a seaborn scatterplot

Return type:

fig

nltools.simulator: Simulator Tools

NeuroLearn Simulator Tools

Tools to simulate multivariate data.

class nltools.simulator.Simulator(brain_mask=None, output_dir=None, random_state=None)[source]
create_cov_data(cor, cov, sigma, mask=None, reps=1, n_sub=1, output_dir=None)[source]

create continuous simulated data with covariance

Parameters:
  • cor – amount of covariance between each voxel and Y variable

  • cov – amount of covariance between voxels

  • sigma – amount of noise to add

  • radius – vector of radius. Will create multiple spheres if len(radius) > 1

  • center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

  • reps – number of data repetitions

  • n_sub – number of subjects to simulate

  • output_dir – string path of directory to output data. If None, no data will be written

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

create_data(levels, sigma, radius=5, center=None, reps=1, output_dir=None)[source]

create simulated data with integers

Parameters:
  • levels – vector of intensities or class labels

  • sigma – amount of noise to add

  • radius – vector of radius. Will create multiple spheres if len(radius) > 1

  • center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

  • reps – number of data repetitions useful for trials or subjects

  • output_dir – string path of directory to output data. If None, no data will be written

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

create_ncov_data(cor, cov, sigma, masks=None, reps=1, n_sub=1, output_dir=None)[source]

create continuous simulated data with covariance

Parameters:
  • cor – amount of covariance between each voxel and Y variable (an int or a vector)

  • cov – amount of covariance between voxels (an int or a matrix)

  • sigma – amount of noise to add

  • mask – region(s) where we will have activations (list if more than one)

  • reps – number of data repetitions

  • n_sub – number of subjects to simulate

  • output_dir – string path of directory to output data. If None, no data will be written

  • **kwargs – Additional keyword arguments to pass to the prediction algorithm

gaussian(mu, sigma, i_tot)[source]

create a 3D gaussian signal normalized to a given intensity

Parameters:
  • mu – average value of the gaussian signal (usually set to 0)

  • sigma – standard deviation

  • i_tot – sum total of activation (numerical integral over the gaussian returns this value)

n_spheres(radius, center)[source]

generate a set of spheres in the brain mask space

Parameters:
  • radius – vector of radius. Will create multiple spheres if len(radius) > 1

  • centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

normal_noise(mu, sigma)[source]

produce a normal noise distribution for all all points in the brain mask

Parameters:
  • mu – average value of the gaussian signal (usually set to 0)

  • sigma – standard deviation

sphere(r, p)[source]

create a sphere of given radius at some point p in the brain mask

Parameters:
  • r – radius of the sphere

  • p – point (in coordinates of the brain mask) of the center of the sphere

to_nifti(m)[source]

convert a numpy matrix to the nifti format and assign to it the brain_mask’s affine matrix

Parameters:

m – the 3D numpy matrix we wish to convert to .nii

Index