mmc.Hierarchy

class mmochi.hierarchy.Hierarchy(default_min_events=0.001, default_class_weight='balanced_subsample', default_clf_kwargs=dict(max_depth=20, n_estimators=100, n_jobs=-1, bootstrap=True, verbose=True), default_in_danger_noise_checker=True, default_is_cutoff=False, default_features_limit=None, default_max_training=20000, default_force_spike_ins=[], default_calibrate=True, load=None)

Class to organize a MMoCHi hierarchy. The Hierarchy is a tree with alternating subset and classification nodes for progressively annotating cell types. Subset nodes define cell populations and the Hierarchy is initialized with a root Subset “All”, representing all events in the dataset. All other Subsets originate from a Classification node. Classification nodes are defined with a list of markers (for high-confidence labeling), and normally trigger selection of high-confidence events, training of a random forest classifier, and prediction. If a classification node is a cutoff, it will only trigger a selection of high-confidence events, and only those events will be cast into subsets. Subset nodes also contain the cell type definitions used for high-confidence thresholding.

Initializing the Hierarchy, you can also define many classification defaults, which can be additionally customized for each Classification node.

Parameters:
  • default_min_events (Union[int, float] (default: 0.001)) – The default minimum number of (or proportion of total) high-confidence events that must be identified in order to train a random forest classifier with each Subset. If not enough events are identified, that Subset will be skipped.
  • default_class_weight (Union[str, dict, List[dict]] (default: 'balanced_subsample')) – The default class_weight strategy for handling scoring. This is passed to sklearn.ensemble.RandomForestClassifier.
  • default_clf_kwargs (dict (default: dict(max_depth=20, n_estimators=100, n_jobs=-1, bootstrap=True, verbose=True))) – The default keyword arguments for classification. For more information about other kwargs that can be set, please see: sklearn.ensemble.RandomForestClassifier. In the case of batch-integrated classification, n_estimators refers to the (approximate) total trees in each forest.
  • default_in_danger_noise_checker (Union[str, bool] (default: True)) – The default for whether to check for (and amplify or remove, respectively) in danger and noise events. In danger events are high-confidence events at classification boundaries. Events labeled noise are high-confidence events whose nearest neighbors do not share the same label, and are thus likely mislabeled. Can be a boolean, or “in danger only”/”noise only”.
  • default_is_cutoff (Union[bool, str] (default: False)) – Whether Classification nodes should be treated as a cutoff by default (triggering only high-confidence thresholding) or non-cutoff (where a random forest is trained and all events are classified).
  • default_features_limit (Union[List[str], Dict[str, List[str]], None] (default: None)) – Listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’,’gene_2’,…], ‘modality_2’:’All’} Specifies the default features allowed for training the classifier.
  • default_max_training (int (default: 20000)) – Specifies the default maximum number of events used for training. This directly affects training speed.
  • default_force_spike_ins (List[str] (default: [])) – The default list of Subsets for which training events should be sampled with spike-ins from across batches, even if individual batches have enough events for training. This can be useful for cell types that are very heterogenous across batches.
  • default_calibrate (bool (default: True)) – Default for whether to perform calibration on the prediction probabilities of the random forest classifier. Uncalibrated values reflect the percent of trees in agreement. Calibrated values more-closely reflect the percent of calls correctly made at any given confidence level.
  • load (Optional[str] (default: None)) – Either None (to initiate a new hierarchy) or a path to a hierarchy to load (exclude .hierarchy in the path). Note that loading a hierarchy overrides all other defaults.

Methods

add_classification(name, parent_subset, markers) Add a Classification beneath a Subset.
add_subset(name, parent_classification, values) Add a Subset beneath a Classification node.
batchless_thresholds([name, batch]) Sets thresholds, removing any that are batch-specific, and setting the threshold to the average threshold across batches
check_all_markers(adata[, data_key]) Asserts all markers in hierarchy identified by .get_all_markers() are in adata.X or .obsm[data_key].
classification_markers(name) Provides markers used in one Classification node paired with the high-confidence definitions for each of its Subset nodes.
classification_parents(name) Provides the names of a node's parent and grandparent.
color_dict([new_color_palette, mode, ...]) Provides a dictionary of colors associated with each subset in the hierarchy
copy() Performs a hard copy of the hierarchy (completely unlinked to the original).
display([plot, return_graph, ...]) Display the hierarchy in a user-friendly format.
drop_threshold(marker[, name, batch]) Remove thresholds from the database.
flatten_children(parent_subset_to_dissolve) Flattens child nodes of the hierarchy.
get_all_markers() Provides a list of all the markers used for high-confidence thresholding.
get_classifications() Provides a list of all classification (or cutoff) nodes in the hierarchy.
get_clf(name[, base]) Gets the classifier and feature names of a given node.
get_info(name, info_type) Gets specified information for a node in the hierarchy.
get_threshold_info(marker, name[, batch, ...]) Identifies and returns threshold information, with support for searching all levels or batches if specified location lacks information.
has_clf(name) Checks whether a given node has a trained classifier defined.
load_thresholds(df[, verbose]) Loads in thresholds from a .csv file.
reset_thresholds() Removes all thresholds from thresholds DataFrame.
run_all_thresholds(adata[, data_key, ...]) Runs thresholding using the thresholding.threshold() function.
save(name) Save Hierarchy as a .hierarchy
save_thresholds([save_path, non_destructive]) Saves thresholds as a .csv file, non_desctructive saving loads in the old file and appends new definitions onto it
set_clf(name, clf, feature_names) Stores a trained classifier and a list of features used for training of a specified classification level.
set_threshold(marker, thresholds, interactive) Sets a threshold in the Hierarchy for one marker
subsets_info(name) Provides information of the subsets beneath a classification layer and their high-confidence threshold definitions.
to_graphviz([supress_labels, node_width, ...]) Exports the tree in the dot format of the graphviz software, which can be useful for plotting.