mmc.identify_group_markers()

mmochi.classifier.identify_group_markers(adata, group1, group2=[], batch_val=None, batch=utils.BATCH_KEY, reference='leiden', key_added='groups', filtered=False, plot=True, min_fold_change=2, min_in_group_fraction=.5, max_out_group_fraction=0.25, n_ups=29, n_downs=30, use_raw=False, return_df=True)

Calculates differentially expressed genes between two provided groups (specified in the .obs). Built on scanpy.tl.rank_genes_groups Use https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html for further details and documentation.

Parameters:
  • adata (AnnData object) – Log normalized data to run differential expresssion on. Contains group 1 and 2 in .obs. Must have X_umap to plot.

  • group1 (Union[str, Iterable[str]]) – Subset of groups to compare on, see sc.tl.rank_genes_groups for more detailed documentation

  • group2 (Union[str, Iterable[str]]) – Second subset of groups to compare on, see sc.tl.rank_genes_groups for more detailed documentation

  • batch_val (str or list of str) – Batch or batches used to limit the events that differential expression is run on (in adata.obs[batch])

  • batch (str) – Name of a column in adata.obs that corresponds to a batch for use in the classifier

  • reference (str) – .obs tag to find and split groups 1 and 2 on

  • key_added (str) – Key in the adata object that contains the classification results

  • filtered (bool) – Uses sc.tl.filter_rank_genes_groups to filter out genes base on log fold changes and genes inside and outside groups

  • plot (bool) – Whether to plot a UMAP of the events colored by their differential expression, using X_umap.

  • min_fold_change (int) – If filtered, passed to sc.tl.filter_rank_genes_groups

  • min_in_group_fraction (float [0,1]) – If filtered, passed to sc.tl.filter_rank_genes_groups

  • max_out_group_fraction (float [0,1]) – If filtered, passed to sc.tl.filter_rank_genes_groups

  • n_ups (int) – If plot, includes best n_ups values

  • n_downs (int) – If plot, includes worst n_downs values

  • use_raw (bool) – Whether to pull expression for DE analysis from the .raw

  • return_df (bool) – Whether to return a df of genes ranked in their ability to characterize group 1

Returns:

df – If return_df, returns pandas df of genes ranked in their ability to characterize group 1

Return type:

pandas df