mmc.preprocess_adatas()

mmochi.utils.preprocess_adatas(adatas=None, convert_from_10X=True, make_unique=True, drop_totalseq=True, intersect=True, backup_urls=None, data_key='protein', log_CP_GEX=1e4, log_CP_ADT=1e3)

Function to load and preprocess adatas from either filename(s) or backup_url(s).

Parameters:
  • adatas (Union[AnnData, List[AnnData], str, List[str]] (default: None)) – Individual or list of filepaths or AnnData objects

  • convert_from_10X (bool (default: True)) – Determines whether to convert from 10x data style

  • make_unique (bool (default: True)) – Determines whether to make labels in AnnData objects unique by adding ‘-1’ etc. to duplicates

  • drop_totalseq (bool (default: True)) – Determines whether to drop ‘_TotalSeq’ from the data_key labels

  • intersect (bool (default: True)) – Determines whether to subset each DataFrame to ‘data_key’ and gene expression data only found on all DataFrames

  • backup_urls (Union[str, List[str]] (default: None)) – url or urls to the AnnData objects to be used if the filepath does not work

  • data_key (str (default: 'protein')) – Must be in .obsm, name used for data intersection and adt normalization

  • log_CP_GEX (int (default: 1e4)) – Normalizes Gene expression counts to log(counts/total_counts*log_CP_GEX + 1), default 10000

  • log_CP_ADT (int (default: 1e3)) – Normalizes ADT counts to log(counts/total_counts*log_CP_ADT + 1), default 1000

Returns:

if make unique:

containing unique obs and var labels, with duplicates named ‘-1’, ‘-2’ etc.

if convert_from_10x:

with protein data moved to the .obsm

if intersect:

containing only the subset of genes and ‘data_key’ that exists in all objects

if log_CP_GEX:

log normalized gene counts per x value

if log_CP_ADT:

log normalized ‘data_key’ counts per x value

Return type:

List of AnnData object(s)