mmc.classifier_setup()

mmochi.classifier.classifier_setup(adata, x_modalities, data_key=utils.DATA_KEY, reduce_features_min_cells=0, features_limit=None)

Setup that can optionally be completed before running mmc.classify. This can be run before the classifier (to reduce runtime of the classifier function in a parameter optimization loop) or is automatically run when training a classifier. It concatenates the .X and any data_key in the .obsm, then performs feature reduction (if reduce_features_min_cells > 0). Next, features can be limited by an external feature set. Then, it sorts the resulting feature_names (the columns from the .X and .obsm[data_key]) and csr.matrix, alphabetically, to make the feature order reproducible across runs. If defined, feature limits can be performed so that you can match the expected features of the hierarchy.

Parameters:
  • adata (AnnData) – Object containing gene expression data, and expression data for modalities for every data key in .obsm

  • x_modalities (Union[str, List[str]]) – Name of the modality of the data in the .X of adata, or name to use for all of .X, if None assumes ‘gex’.

  • data_key (Union[str, list, None] (default: utils.DATA_KEY)) – Key in adata.obsm to concatenate into .X and to reduce features across

  • reduce_features_min_cells (int (default: 0)) – Remove features that vary in fewer than this number of cells passed to _reduce_features

  • features_limit (default: None) – listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’, ‘gene_2’, …], ‘modality_2’:’All’} Specifies the allowed features to classify on for a given modality

Return type:

Tuple[csr_matrix, list]

Returns:

  • scipy.sparse.csr_matrix – Reduced adata data for classification

  • list – List of features that were checked/used in the reduction process