Reproduce the heatmap from inferCNV

This document demonstrates to reproduce how the example heatmap from the original R inverCNV implementation. It is based on a small, 183-cell example dataset of malignant and non-malignant cells from Oligodendroglioma derived from Tirosh et al. (2016).

[1]:
import infercnvpy as cnv
import scanpy as sc
/home/docs/checkouts/readthedocs.org/user_builds/holonet-tutorial/envs/latest/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Prepare and inspect dataset

The example dataset is available in the datasets module. It is already TPM-normalized, but not log-transformed.

[2]:
adata = cnv.datasets.oligodendroglioma()
sc.pp.log1p(adata)

It also already has the genomic positions annotated inadata.var:

[3]:
adata.var.head()
[3]:
chromosome start end n_counts n_cells
gene_symbol
WASH7P chr1 14363 29806 534.271301 116
LINC00115 chr1 761586 762902 203.649094 18
NOC2L chr1 879584 894689 880.449707 63
SDF4 chr1 1152288 1167411 1013.251404 70
UBE2J2 chr1 1189289 1209265 411.488953 33

It contains four types of malignant cells, and two clusters of non-malignant cells.

[4]:
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color="cell_type")
/home/docs/checkouts/readthedocs.org/user_builds/holonet-tutorial/envs/latest/lib/python3.9/site-packages/traitlets/traitlets.py:3258: FutureWarning: --rc={'figure.dpi': 96} for dict-traits is deprecated in traitlets 5.0. You can pass --rc <key=value> ... multiple times to add items to a dict.
  warn(
../_images/tutorials_reproduce_infercnv_7_1.svg

Run infercnvpy

In this case we know which cells are non-malignant. For best results, it is recommended to use the non-malignant cells as a background. We can provide this information using reference_key and reference_cat.

In order to reproduce the results as exactely as possible, we use a window_size of 100 and a step of 1.

[5]:
%%time
cnv.tl.infercnv(
    adata,
    reference_key="cell_type",
    reference_cat=["Oligodendrocytes (non-malignant)", "Microglia/Macrophage"],
    window_size=100,
    step=1,
)
100%|██████████| 1/1 [00:00<00:00,  3.54it/s]
CPU times: user 39 ms, sys: 36.2 ms, total: 75.2 ms
Wall time: 380 ms

[6]:
%%time
cnv.pl.chromosome_heatmap(adata, groupby="cell_type", dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_cell_type). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 9111 dimensions of `.X`, if you really want this, set `use_rep='X'`.
         Falling back to preprocessing with `sc.pp.pca` and default params.
/home/docs/checkouts/readthedocs.org/user_builds/holonet-tutorial/envs/latest/lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py:59: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  tmp_adata = AnnData(X=adata.obsm[f"X_{use_rep}"], obs=adata.obs)
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: Microglia/Macrophage, Oligodendrocytes (non-malignant), malignant_93, etc.
var_group_labels: chr1, chr2, chr3, etc.
../_images/tutorials_reproduce_infercnv_10_3.svg
CPU times: user 1.35 s, sys: 409 ms, total: 1.76 s
Wall time: 1.12 s

Note that running the same analysis in R (invercnv v1.6.0 from Bioconductor) takes about 1:30 min.