mmc.classifier_setup()
- mmochi.classifier.classifier_setup(adata, x_modalities, data_key=utils.DATA_KEY, reduce_features_min_cells=0, features_limit=None)
Setup that can optionally be completed before running mmc.classify. This can be run before the classifier (to reduce runtime of the classifier function in a parameter optimization loop) or is automatically run when training a classifier. It concatenates the .X and any data_key in the .obsm, then performs feature reduction (if reduce_features_min_cells > 0). Next, features can be limited by an external feature set. Then, it sorts the resulting feature_names (the columns from the .X and .obsm[data_key]) and csr.matrix, alphabetically, to make the feature order reproducible across runs. If defined, feature limits can be performed so that you can match the expected features of the hierarchy.
- Parameters:
adata (
AnnData) – Object containing gene expression data, and expression data for modalities for every data key in .obsmx_modalities (
Union[str,List[str]]) – Name of the modality of the data in the .X of adata, or name to use for all of .X, if None assumes ‘gex’.data_key (
Union[str,list,None] (default:utils.DATA_KEY)) – Key in adata.obsm to concatenate into .X and to reduce features acrossreduce_features_min_cells (
int(default:0)) – Remove features that vary in fewer than this number of cells passed to _reduce_featuresfeatures_limit (default:
None) – listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’, ‘gene_2’, …], ‘modality_2’:’All’} Specifies the allowed features to classify on for a given modality
- Return type:
- Returns:
scipy.sparse.csr_matrix – Reduced adata data for classification
list – List of features that were checked/used in the reduction process