mmc.Hierarchy
-
class
mmochi.hierarchy.Hierarchy(default_min_events=0.001, default_class_weight='balanced_subsample', default_clf_kwargs=dict(max_depth=20, n_estimators=100, n_jobs=-1, bootstrap=True, verbose=True), default_in_danger_noise_checker=True, default_is_cutoff=False, default_features_limit=None, default_max_training=20000, default_force_spike_ins=[], default_calibrate=True, load=None) Class to organize a MMoCHi hierarchy. The Hierarchy is a tree with alternating subset and classification nodes for progressively annotating cell types. Subset nodes define cell populations and the Hierarchy is initialized with a root Subset “All”, representing all events in the dataset. All other Subsets originate from a Classification node. Classification nodes are defined with a list of markers (for high-confidence labeling), and normally trigger selection of high-confidence events, training of a random forest classifier, and prediction. If a classification node is a cutoff, it will only trigger a selection of high-confidence events, and only those events will be cast into subsets. Subset nodes also contain the cell type definitions used for high-confidence thresholding.
Initializing the Hierarchy, you can also define many classification defaults, which can be additionally customized for each Classification node.
Parameters: - default_min_events (
Union[int,float] (default:0.001)) – The default minimum number of (or proportion of total) high-confidence events that must be identified in order to train a random forest classifier with each Subset. If not enough events are identified, that Subset will be skipped. - default_class_weight (
Union[str,dict,List[dict]] (default:'balanced_subsample')) – The default class_weight strategy for handling scoring. This is passed to sklearn.ensemble.RandomForestClassifier. - default_clf_kwargs (
dict(default:dict(max_depth=20, n_estimators=100, n_jobs=-1, bootstrap=True, verbose=True))) – The default keyword arguments for classification. For more information about other kwargs that can be set, please see: sklearn.ensemble.RandomForestClassifier. In the case of batch-integrated classification, n_estimators refers to the (approximate) total trees in each forest. - default_in_danger_noise_checker (
Union[str,bool] (default:True)) – The default for whether to check for (and amplify or remove, respectively) in danger and noise events. In danger events are high-confidence events at classification boundaries. Events labeled noise are high-confidence events whose nearest neighbors do not share the same label, and are thus likely mislabeled. Can be a boolean, or “in danger only”/”noise only”. - default_is_cutoff (
Union[bool,str] (default:False)) – Whether Classification nodes should be treated as a cutoff by default (triggering only high-confidence thresholding) or non-cutoff (where a random forest is trained and all events are classified). - default_features_limit (
Union[List[str],Dict[str,List[str]],None] (default:None)) – Listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’,’gene_2’,…], ‘modality_2’:’All’} Specifies the default features allowed for training the classifier. - default_max_training (
int(default:20000)) – Specifies the default maximum number of events used for training. This directly affects training speed. - default_force_spike_ins (
List[str] (default:[])) – The default list of Subsets for which training events should be sampled with spike-ins from across batches, even if individual batches have enough events for training. This can be useful for cell types that are very heterogenous across batches. - default_calibrate (
bool(default:True)) – Default for whether to perform calibration on the prediction probabilities of the random forest classifier. Uncalibrated values reflect the percent of trees in agreement. Calibrated values more-closely reflect the percent of calls correctly made at any given confidence level. - load (
Optional[str] (default:None)) – Either None (to initiate a new hierarchy) or a path to a hierarchy to load (exclude .hierarchy in the path). Note that loading a hierarchy overrides all other defaults.
Methods
add_classification(name, parent_subset, markers)Add a Classification beneath a Subset. add_subset(name, parent_classification, values)Add a Subset beneath a Classification node. batchless_thresholds([name, batch])Sets thresholds, removing any that are batch-specific, and setting the threshold to the average threshold across batches check_all_markers(adata[, data_key])Asserts all markers in hierarchy identified by .get_all_markers() are in adata.X or .obsm[data_key]. classification_markers(name)Provides markers used in one Classification node paired with the high-confidence definitions for each of its Subset nodes. classification_parents(name)Provides the names of a node's parent and grandparent. color_dict([new_color_palette, mode, ...])Provides a dictionary of colors associated with each subset in the hierarchy copy()Performs a hard copy of the hierarchy (completely unlinked to the original). display([plot, return_graph, ...])Display the hierarchy in a user-friendly format. drop_threshold(marker[, name, batch])Remove thresholds from the database. flatten_children(parent_subset_to_dissolve)Flattens child nodes of the hierarchy. get_all_markers()Provides a list of all the markers used for high-confidence thresholding. get_classifications()Provides a list of all classification (or cutoff) nodes in the hierarchy. get_clf(name[, base])Gets the classifier and feature names of a given node. get_info(name, info_type)Gets specified information for a node in the hierarchy. get_threshold_info(marker, name[, batch, ...])Identifies and returns threshold information, with support for searching all levels or batches if specified location lacks information. has_clf(name)Checks whether a given node has a trained classifier defined. load_thresholds(df[, verbose])Loads in thresholds from a .csv file. reset_thresholds()Removes all thresholds from thresholds DataFrame. run_all_thresholds(adata[, data_key, ...])Runs thresholding using the thresholding.threshold() function. save(name)Save Hierarchy as a .hierarchy save_thresholds([save_path, non_destructive])Saves thresholds as a .csv file, non_desctructive saving loads in the old file and appends new definitions onto it set_clf(name, clf, feature_names)Stores a trained classifier and a list of features used for training of a specified classification level. set_threshold(marker, thresholds, interactive)Sets a threshold in the Hierarchy for one marker subsets_info(name)Provides information of the subsets beneath a classification layer and their high-confidence threshold definitions. to_graphviz([supress_labels, node_width, ...])Exports the tree in the dot format of the graphviz software, which can be useful for plotting. - default_min_events (