infercnvpy.tl.copykat

infercnvpy.tl.copykat(adata, gene_ids='S', segmentation_cut=0.1, distance='euclidean', s_name='copykat_result', min_genes_chr=5, key_added='cnv', inplace=True, layer=None, n_jobs=None, norm_cell_names='')

Inference of genomic copy number and subclonal structure.

Runs CopyKAT (Copynumber Karyotyping of Tumors) [RSY+21] based on integrative Bayesian approaches to identify genome-wide aneuploidy at 5MB resolution in single cells to separate tumor cells from normal cells, and tumor subclones using high-throughput sc-RNAseq data.

Note on input data from the original authors:

The matrix values are often the count of unique molecular identifier (UMI) from nowadays high througput single cell RNAseq data. The early generation of scRNAseq data may be summarized as TPM values or total read counts, which should also work.

This means that unlike for infercnvpy.tl.infercnv() the input data should not be log-transformed.

CopyKAT also does NOT require running infercnvpy.io.genomic_position_from_gtf(), it infers the genomic position from the gene symbols in adata.var_names.

You can find more info on GitHub: https://github.com/navinlabcode/copykat

Parameters
adata : AnnData

annotated data matrix

key_added : str (default: 'cnv')

Key under which the copyKAT scores will be stored in adata.obsm and adata.uns.

inplace : bool (default: True)

If True, store the result in adata, otherwise return it.

gene_ids : str (default: 'S')

gene id type: Symbol (“S”) or Ensemble (“E”).

segmentation_cut : float (default: 0.1)

segmentation parameters, input 0 to 1; larger looser criteria.

distance : str (default: 'euclidean')

distance methods include “euclidean”, and correlation coverted distance include “pearson” and “spearman”.

s_name : str (default: 'copykat_result')

sample (output file) name.

min_genes_chr : int (default: 5)

minimal number of genes per chromosome for cell filtering.

norm_cell_names : str (default: '')

cell barcodes (adata.obs.index) indicate normal cells

n_jobs : int | NoneOptional[int] (default: None)

Number of cores to use for copyKAT analysis. Per default, uses all cores available on the system. Multithreading does not work on Windows and this value will be ignored.

Return type

(DataFrame, Series)

Returns

Depending on the value of inplace, either returns None or a tuple (CNV Matrix,`CopyKat prediction`)