mmc.preprocess_adatas()

mmochi.utils.preprocess_adatas(adatas=None, convert_from_10X=True, make_unique=True, drop_totalseq=True, intersect=True, backup_urls=None, data_key='protein', log_CP_GEX=1e4, log_CP_ADT=1e3)

Function to load and preprocess adatas from either filename(s) or backup_url(s).

Parameters:

adatas (Union[AnnData, List[AnnData], str, List[str], None] (default: None)) – Individual or list of filepaths or AnnData objects
convert_from_10X (bool (default: True)) – Determines whether to convert from 10x data style
make_unique (bool (default: True)) – Determines whether to make labels in AnnData objects unique by adding ‘-1’ etc. to duplicates
drop_totalseq (bool (default: True)) – Determines whether to drop ‘_TotalSeq’ from the data_key labels
intersect (bool (default: True)) – Determines whether to subset each DataFrame to ‘data_key’ and gene expression data only found on all DataFrames
backup_urls (Union[str, List[str], None] (default: None)) – url or urls to the AnnData objects to be used if the filepath does not work
data_key (str (default: 'protein')) – Must be in .obsm, name used for data intersection and adt normalization
log_CP_GEX (int (default: 1e4)) – Normalizes Gene expression counts to log(counts/total_counts*log_CP_GEX + 1), default 10000
log_CP_ADT (int (default: 1e3)) – Normalizes ADT counts to log(counts/total_counts*log_CP_ADT + 1), default 1000

Returns:

if make unique:: containing unique obs and var labels, with duplicates named ‘-1’, ‘-2’ etc.
if convert_from_10x:: with protein data moved to the .obsm
if intersect:: containing only the subset of genes and ‘data_key’ that exists in all objects
if log_CP_GEX:: log normalized gene counts per x value
if log_CP_ADT:: log normalized ‘data_key’ counts per x value

Return type:

List of AnnData object(s)