API
Import MMoCHi as:
`import mmochi as mmc`
For usage examples, see the Tutorials.
Hierarchy Generation
Designing your hierarchy is the first step to classification. The Hierarchy class contains many methods used for building the hierarchy and defining high-confidence thresholds.
Class to organize a MMoCHi hierarchy. |
|
A Hierarchy building block, describing subsetting rules, whose parent is a subset (or "all"). |
|
A Hierarchy building block, describing a population of cells beneath a classification layer. |
|
Helper function for defining simple or complex gating strategies for high-confidence thresholding. |
Thresholding
High-confidence thresholding is performed primarily through methods in the mmc.Hierarchy object, but these are the functions that perform thresholding under the hood.
|
Performs thresholding for marker, displays expression distribution (colored by "pos", "?", and "neg") for visualization and interactive adjustment, and optionally returns thresholds and thresholded events. |
|
Lightweight wrapper to find the marker (utils.get_data()), then performs pos/neg/? thresholding on all events in the AnnData, given a list of positive and negative thresholds. |
Classification
Once the Hierarchy is created and high-confidence thresholds are drawn, you are ready to classify. The mmc.classify function runs mmc.classifier_setup and mmc.hc_threshold internally, but you have the option to run these separately for testing.
|
Setup that can optionally be completed before running mmc.classify. |
|
Performs high-confidence thresholding using the subset definitions defined in one level of a MMoCHi hierarchy. |
|
Classify subsets using the provided hierarchy. |
|
Create a column in the .obs featuring the most specific classification/subset for each event. |
|
Creates column in .obsm[key_added] with randomly selected holdout for benchmarking its performance against other tools. |
Plotting
Once you have run your classification, you may be interested in plotting some metrics of its performance or evaluating feature importances.
|
Determine the performance at a single level by creating a confusion plot using high-confidence thresholds as truth. |
|
Determine how confident classification is for each subset by displaying calibration curves, which compare the events classified at a given class to its confidence. |
|
Returns a DataFrame of features used in classification and their importances in the random forest of a given level. |
|
Creates violin plots for the 25 most important genes or proteins for each specified level in levels. |
|
Plots a tree from the random forest at a specified level of the classifier. |
There are also a few plotting functions we have created for interrogating high-confidence thresholds and classifier performance using UMAPs:
|
Plots UMAPs for the listed markers with thresholded expression data for the markers overlayed on top. |
|
Plots UMAPs showing events selected by high-confidence thresholding and used for training and breaks down annotation confusion onto the UMAP. |
Landmark Resgistration
Prior to classification, you may be interested in performing batch correction on ADT expression. This module contains the tools necessary to perform and evaluate batch correction by landmark registration.
|
Batch correction for expression of all ADTs. |
|
Landmark registration batch correction for ADT expression for a single marker on a single batch. |
|
Method to plot multiple density plots of positive and negative peaks for batches and [data_keys]s with properly placed labels. |
|
Creates density plot of a single batch for a single marker for a given data_key. |
|
Plots density of a single marker on a single batch in front of the density of this marker for the whole dataset. |
|
Update peak overrides object for a single batch, marker. |
|
Saves peak overrides to a JSON file for easy loading. |
|
Loads peak overrides from a JSON file. |
Helper functions
We have also developed a suite of helper functions which may be useful for running MMoCHi or preparing your data.
|
Lookup a marker name within the .X, the .obsm[data_key], or .var[MODALITY_COLUMN] == data_key. |
|
Searches an AnnData object along its .var, .var[MODALITY_COLUMN], .obs, .layers, and .obsm[preferred_data_key] for a specified parameter. |
|
Convert default 10X data to an AnnData with protein in the .obsm, and gene expression in the .X |
|
Makes the .obsm[data_key] of an AnnData object into its .X, if multiple data_key values provided, will merge first found key. |
|
Function to load and preprocess adatas from either filename(s) or backup_url(s). |
|
Subsets each AnnData object to only genes and values of data_key found in every AnnData objects. |
|
Reads in adata objects from a list of paths, or takes in a list of adata objects and finds the features in all objects |
|
Generates a series of masks, for each different batch in batch_key and its corresponding batch name |
|
Calculates differentially expressed genes between two provided groups (specified in the .obs). |
Logging
MMoCHi has built-in logs which can be helpful for debugging or reproducibility.
|
Enable logging for all functions in the MMoCHi package |