mmc.preprocess_adatas()
- mmochi.utils.preprocess_adatas(adatas=None, convert_from_10X=True, make_unique=True, drop_totalseq=True, intersect=True, backup_urls=None, data_key='protein', log_CP_GEX=1e4, log_CP_ADT=1e3)
Function to load and preprocess adatas from either filename(s) or backup_url(s).
- Parameters:
adatas (
Union[AnnData,List[AnnData],str,List[str]] (default:None)) – Individual or list of filepaths or AnnData objectsconvert_from_10X (
bool(default:True)) – Determines whether to convert from 10x data stylemake_unique (
bool(default:True)) – Determines whether to make labels in AnnData objects unique by adding ‘-1’ etc. to duplicatesdrop_totalseq (
bool(default:True)) – Determines whether to drop ‘_TotalSeq’ from the data_key labelsintersect (
bool(default:True)) – Determines whether to subset each DataFrame to ‘data_key’ and gene expression data only found on all DataFramesbackup_urls (
Union[str,List[str]] (default:None)) – url or urls to the AnnData objects to be used if the filepath does not workdata_key (
str(default:'protein')) – Must be in .obsm, name used for data intersection and adt normalizationlog_CP_GEX (
int(default:1e4)) – Normalizes Gene expression counts to log(counts/total_counts*log_CP_GEX + 1), default 10000log_CP_ADT (
int(default:1e3)) – Normalizes ADT counts to log(counts/total_counts*log_CP_ADT + 1), default 1000
- Returns:
- if make unique:
containing unique obs and var labels, with duplicates named ‘-1’, ‘-2’ etc.
- if convert_from_10x:
with protein data moved to the .obsm
- if intersect:
containing only the subset of genes and ‘data_key’ that exists in all objects
- if log_CP_GEX:
log normalized gene counts per x value
- if log_CP_ADT:
log normalized ‘data_key’ counts per x value
- Return type:
List of AnnData object(s)