mmc.Classification
-
class
mmochi.hierarchy.Classification(markers, min_events=None, class_weight=None, in_danger_noise_checker=None, classifier=None, features_limit=None, feature_names=None, is_cutoff=False, max_training=None, force_spike_ins=[], calibrate=None, clf_kwargs={}) A Hierarchy building block, describing subsetting rules, whose parent is a subset (or “all”). These can be added to a Hierarchy using the .add_classification() method
Parameters: - markers (
List[str]) – The features that will be used for high-confidence thresholding to define subsets beneath this classification. During thresholding, matching or similar feature names are looked up first in the provided data_key, then in the .var. See mmc.utils.marker for details on feature lookup. - min_events (
Union[int,float,None] (default:None)) – The minimum number of (or proportion of total) high-confidence events that must be identified for in order to train a random forest classifier with each Subset. If not enough events are identified, that Subset will be skipped. - class_weight (
Union[dict,List[dict],None] (default:None)) – The class_weight strategy for handling scoring (“balanced” or “balanced_subsample”). This is passed to sklearn.ensemble.RandomForestClassifier. - in_danger_noise_checker (
Union[str,bool,None] (default:None)) – Whether to check for (and amplify or remove, respectively) in danger and noise events. In danger events are high-confidence events at classification boundaries. Events labeled noise are high-confidence events whose nearest neighbors do not share the same label, and are thus likely mislabeled. Can be a boolean, “in danger only”, or “noise only” for only amplifying danger or removing noise respectively. - classifier (default:
None) – The classifier to be used for classification. If defined, one must also define feature_names. - features_limit (
Optional[List[str]] (default:None)) – listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’,’gene_2’,…], ‘modality_2’:’All’} Specifies the default features allowed for training the classifier. - feature_names (
Optional[List[str]] (default:None)) – Names of features used to train this classifier. Not set if classifier is None. - is_cutoff (
Optional[bool] (default:False)) – The default for whether Classification nodes should be treated as a cutoff triggering only high-confidence thresholding (True) or if a random forest should be created and trained to make classification (False). Cutoff layers can also be used with categorical or boolean data to subset down to a single tissue site or other relevant metadata. - features_limit – Listlike of str or dictionary in the format {‘modality_1’:[‘gene_1’,’gene_2’,…], ‘modality_2’:’All’} Specifies the default features allowed for training the classifier.
- max_training (
Optional[int] (default:None)) – Specifies the default maximum number of events used for training. This directly affects training speed. - force_spike_ins (default:
[]) – The default list of Subsets for which training events should be sampled with spike-ins from across batches, even if individual batches have enough events for training. This can be useful for cell types that are very heterogenous across batches. - calibrate (
Optional[bool] (default:None)) – Default for whether to perform calibration on the prediction probabilities of the random forest classifier. Uncalibrated values reflect the % of trees in agreement. Calibrated values more-closely reflect the % of calls correctly made at any given confidence level. - clf_kwargs (
dict(default:{})) – The keyword arguments for classification. For more information about other kwargs that can be set, please see: sklearn.ensemble.RandomForestClassifier. In the case of batch-integrated classification, n_estimators refers to the (approximate) total trees in the forest.
Methods
- markers (