mmc.preprocess_adatas()
-
mmochi.utils.preprocess_adatas(adatas=None, convert_from_10X=True, make_unique=True, drop_totalseq=True, intersect=True, backup_urls=None, data_key='protein', log_CP_GEX=1e4, log_CP_ADT=1e3) Function to load and preprocess adatas from either filename(s) or backup_url(s).
Parameters: - adatas (
Union[AnnData,List[AnnData],str,List[str],None] (default:None)) – Individual or list of filepaths or AnnData objects - convert_from_10X (
bool(default:True)) – Determines whether to convert from 10x data style - make_unique (
bool(default:True)) – Determines whether to make labels in AnnData objects unique by adding ‘-1’ etc. to duplicates - drop_totalseq (
bool(default:True)) – Determines whether to drop ‘_TotalSeq’ from the data_key labels - intersect (
bool(default:True)) – Determines whether to subset each DataFrame to ‘data_key’ and gene expression data only found on all DataFrames - backup_urls (
Union[str,List[str],None] (default:None)) – url or urls to the AnnData objects to be used if the filepath does not work - data_key (
str(default:'protein')) – Must be in .obsm, name used for data intersection and adt normalization - log_CP_GEX (
int(default:1e4)) – Normalizes Gene expression counts to log(counts/total_counts*log_CP_GEX + 1), default 10000 - log_CP_ADT (
int(default:1e3)) – Normalizes ADT counts to log(counts/total_counts*log_CP_ADT + 1), default 1000
Returns: - if make unique:
containing unique obs and var labels, with duplicates named ‘-1’, ‘-2’ etc.
- if convert_from_10x:
with protein data moved to the .obsm
- if intersect:
containing only the subset of genes and ‘data_key’ that exists in all objects
- if log_CP_GEX:
log normalized gene counts per x value
- if log_CP_ADT:
log normalized ‘data_key’ counts per x value
Return type: List of AnnData object(s)
- adatas (