mmc.Hierarchy

class mmochi.hierarchy.Hierarchy(default_min_events=0.001, default_class_weight='balanced_subsample', default_clf_kwargs=dict(max_depth=20, n_estimators=100, n_jobs=-1, bootstrap=True, verbose=True), default_in_danger_noise_checker=True, default_is_cutoff=False, default_features_limit=None, default_max_training=20000, default_force_spike_ins=[], default_calibrate=True, load=None)

Class to organize a MMoCHi hierarchy. The Hierarchy is a tree with alternating subset and classification nodes for progressively annotating cell types. Subset nodes define cell populations and the Hierarchy is initialized with a root Subset “All”, representing all events in the dataset. All other Subsets originate from a Classification node. Classification nodes are defined with a list of markers (for high-confidence labeling), and normally trigger selection of high-confidence events, training of a random forest classifier, and prediction. If a classification node is a cutoff, it will only trigger a selection of high-confidence events, and only those events will be cast into subsets. Subset nodes also contain the cell type definitions used for high-confidence thresholding.

Initializing the Hierarchy, you can also define many classification defaults, which can be additionally customized for each Classification node.

Parameters:

default_min_events (Union[int, float] (default: 0.001)) – The default minimum number of (or proportion of total) high-confidence events that must be identified in order to train a random forest classifier with each Subset. If not enough events are identified, that Subset will be skipped.
default_class_weight (Union[str, dict, List[dict]] (default: 'balanced_subsample')) – The default class_weight strategy for handling scoring. This is passed to sklearn.ensemble.RandomForestClassifier.
default_clf_kwargs (dict (default: dict(max_depth=20, n_estimators=100, n_jobs=-1, bootstrap=True, verbose=True))) – The default keyword arguments for classification. For more information about other kwargs that can be set, please see: sklearn.ensemble.RandomForestClassifier. In the case of batch-integrated classification, n_estimators refers to the (approximate) total trees in each forest.
default_in_danger_noise_checker (Union[str, bool] (default: True)) – The default for whether to check for (and amplify or remove, respectively) in danger and noise events. In danger events are high-confidence events at classification boundaries. Events labeled noise are high-confidence events whose nearest neighbors do not share the same label, and are thus likely mislabeled. Can be a boolean, or “in danger only”/”noise only”.
default_is_cutoff (Union[bool, str] (default: False)) – Whether Classification nodes should be treated as a cutoff by default (triggering only high-confidence thresholding) or non-cutoff (where a random forest is trained and all events are classified).
default_features_limit (Union[List[str], Dict[str, List[str]], None] (default: None)) – Listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’,’gene_2’,…], ‘modality_2’:’All’} Specifies the default features allowed for training the classifier.
default_max_training (int (default: 20000)) – Specifies the default maximum number of events used for training. This directly affects training speed.
default_force_spike_ins (List[str] (default: [])) – The default list of Subsets for which training events should be sampled with spike-ins from across batches, even if individual batches have enough events for training. This can be useful for cell types that are very heterogenous across batches.
default_calibrate (bool (default: True)) – Default for whether to perform calibration on the prediction probabilities of the random forest classifier. Uncalibrated values reflect the percent of trees in agreement. Calibrated values more-closely reflect the percent of calls correctly made at any given confidence level.
load (Optional[str] (default: None)) – Either None (to initiate a new hierarchy) or a path to a hierarchy to load (exclude .hierarchy in the path). Note that loading a hierarchy overrides all other defaults.

Methods

`add_classification`(name, parent_subset, markers)	Add a Classification beneath a Subset.
`add_subset`(name, parent_classification, values)	Add a Subset beneath a Classification node.
`batchless_thresholds`([name, batch])	Sets thresholds, removing any that are batch-specific, and setting the threshold to the average threshold across batches
`check_all_markers`(adata[, data_key])	Asserts all markers in hierarchy identified by .get_all_markers() are in adata.X or .obsm[data_key].
`classification_markers`(name)	Provides markers used in one Classification node paired with the high-confidence definitions for each of its Subset nodes.
`classification_parents`(name)	Provides the names of a node's parent and grandparent.
`color_dict`([new_color_palette, mode, ...])	Provides a dictionary of colors associated with each subset in the hierarchy
`copy`()	Performs a hard copy of the hierarchy (completely unlinked to the original).
`display`([plot, return_graph, ...])	Display the hierarchy in a user-friendly format.
`drop_threshold`(marker[, name, batch])	Remove thresholds from the database.
`flatten_children`(parent_subset_to_dissolve)	Flattens child nodes of the hierarchy.
`get_all_markers`()	Provides a list of all the markers used for high-confidence thresholding.
`get_classifications`()	Provides a list of all classification (or cutoff) nodes in the hierarchy.
`get_clf`(name[, base])	Gets the classifier and feature names of a given node.
`get_info`(name, info_type)	Gets specified information for a node in the hierarchy.
`get_threshold_info`(marker, name[, batch, ...])	Identifies and returns threshold information, with support for searching all levels or batches if specified location lacks information.
`has_clf`(name)	Checks whether a given node has a trained classifier defined.
`load_thresholds`(df[, verbose])	Loads in thresholds from a .csv file.
`reset_thresholds`()	Removes all thresholds from thresholds DataFrame.
`run_all_thresholds`(adata[, data_key, ...])	Runs thresholding using the thresholding.threshold() function.
`save`(name)	Save Hierarchy as a .hierarchy
`save_thresholds`([save_path, non_destructive])	Saves thresholds as a .csv file, non_desctructive saving loads in the old file and appends new definitions onto it
`set_clf`(name, clf, feature_names)	Stores a trained classifier and a list of features used for training of a specified classification level.
`set_threshold`(marker, thresholds, interactive)	Sets a threshold in the Hierarchy for one marker
`subsets_info`(name)	Provides information of the subsets beneath a classification layer and their high-confidence threshold definitions.
`to_graphviz`([supress_labels, node_width, ...])	Exports the tree in the dot format of the graphviz software, which can be useful for plotting.