Reproduce the heatmap from inferCNV
This document demonstrates to reproduce how the example heatmap from the original R inverCNV implementation. It is based on a small, 183-cell example dataset of malignant and non-malignant cells from Oligodendroglioma derived from Tirosh et al. (2016).
[1]:
import infercnvpy as cnv
import scanpy as sc
/home/docs/checkouts/readthedocs.org/user_builds/holonet-tutorial/envs/latest/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Prepare and inspect dataset
The example dataset is available in the datasets module. It is already TPM-normalized, but not log-transformed.
[2]:
adata = cnv.datasets.oligodendroglioma()
sc.pp.log1p(adata)
It also already has the genomic positions annotated inadata.var:
[3]:
adata.var.head()
[3]:
| chromosome | start | end | n_counts | n_cells | |
|---|---|---|---|---|---|
| gene_symbol | |||||
| WASH7P | chr1 | 14363 | 29806 | 534.271301 | 116 |
| LINC00115 | chr1 | 761586 | 762902 | 203.649094 | 18 |
| NOC2L | chr1 | 879584 | 894689 | 880.449707 | 63 |
| SDF4 | chr1 | 1152288 | 1167411 | 1013.251404 | 70 |
| UBE2J2 | chr1 | 1189289 | 1209265 | 411.488953 | 33 |
It contains four types of malignant cells, and two clusters of non-malignant cells.
[4]:
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color="cell_type")
/home/docs/checkouts/readthedocs.org/user_builds/holonet-tutorial/envs/latest/lib/python3.9/site-packages/traitlets/traitlets.py:3258: FutureWarning: --rc={'figure.dpi': 96} for dict-traits is deprecated in traitlets 5.0. You can pass --rc <key=value> ... multiple times to add items to a dict.
warn(
Run infercnvpy
In this case we know which cells are non-malignant. For best results, it is recommended to use the non-malignant cells as a background. We can provide this information using reference_key and reference_cat.
In order to reproduce the results as exactely as possible, we use a window_size of 100 and a step of 1.
[5]:
%%time
cnv.tl.infercnv(
adata,
reference_key="cell_type",
reference_cat=["Oligodendrocytes (non-malignant)", "Microglia/Macrophage"],
window_size=100,
step=1,
)
100%|██████████| 1/1 [00:00<00:00, 3.54it/s]
CPU times: user 39 ms, sys: 36.2 ms, total: 75.2 ms
Wall time: 380 ms
[6]:
%%time
cnv.pl.chromosome_heatmap(adata, groupby="cell_type", dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_cell_type). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 9111 dimensions of `.X`, if you really want this, set `use_rep='X'`.
Falling back to preprocessing with `sc.pp.pca` and default params.
/home/docs/checkouts/readthedocs.org/user_builds/holonet-tutorial/envs/latest/lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py:59: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
tmp_adata = AnnData(X=adata.obsm[f"X_{use_rep}"], obs=adata.obs)
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: Microglia/Macrophage, Oligodendrocytes (non-malignant), malignant_93, etc.
var_group_labels: chr1, chr2, chr3, etc.
CPU times: user 1.35 s, sys: 409 ms, total: 1.76 s
Wall time: 1.12 s
Note that running the same analysis in R (invercnv v1.6.0 from Bioconductor) takes about 1:30 min.