You may click on the following links to fast forward to the codes directly producing the Figures:
Back to index page.
This vignette illustrates the steps by which the reference data (10X)
was processed. It also reproduces some of the Supplementary Figures in
the original publication involved with the pre-processing of reference
data (Lin et al., 2022).
Here we only show how plain plots were generated directly from data. To
reach the final illustrations in the publication, these plots may
require further editing in graphic design softwares (i.e. Adobe
Illustrator). Those beautification steps are not central to our analysis
and therefore will be omitted.
The input file of this vignette is a filtered h5 file
generated by CellRanger
(GSE197743_AAV_MG_filtered_feature_bc_matrix.h5). The
output is a processed .h5Seurat with metadata slot
filled.
Microglia were isolated from adult mice (~8w) using a method based on the “cold-mechanical dissociation” protocol (Bennett et al., 2016)1. The steps involving FACS sorting from the original protocol was omitted to minimize ex vivo activation of microglia.
We found that the single cell transcriptome acquired by 10X Chromium (3’ Chemistry v3) captured transcripts of the AAV-packaged transgene (mScarlet) at a very low rate (data not published). Therefore we carried out deep single cell sequencing using a modified protocol based on a customized Smart-seq2 protocol (Zhang et al., 2020)2 on microglia isolated from mice after stereotaxic AAV delivery. The Smart-seq2 based method captured mScarlet transcripts with substantially higher efficiently.
To characterize the microglia transcriptomes under quiescent and immune activation states, we isolated microglia from mice subjected to Saline, Poly(i:c) and LPS treatment (i.p.), and performed scRNA-seq using the 10X Chromium platform.
Upstream analysis pipelines (i.e. sequence alignment, generation of digital expression matrix, etc.):
cellranger count command and then
aggregated using the cellranger aggr command.Downstream analysis:
scater)Seurat)Seurat)Load reuiqred libraries.
suppressPackageStartupMessages({
library(Seurat)
library(SeuratDisk)
library(scater)
library(tidyverse)
library(RColorBrewer)
library(scales)
})
If you haven’t done so, download data from GEO.
if(!file.exists('../data/GSE197743/GSE197743_AAV_MG_filtered_feature_bc_matrix.h5')){
library(GEOquery)
options(timeout = 6000) ## In case of slow Internet connections
getGEOSuppFiles(GEO = "GSE197743", makeDirectory = T, baseDir = '../data/')
}
Load the data matrix to workspace.
data <- Read10X_h5(filename = '../data/GSE197743/GSE197743_AAV_MG_filtered_feature_bc_matrix.h5')
Create Seurat object.
cells <- CreateSeuratObject(data, project = "AAV_Microglia", names.field = 2, names.delim = "-")
Update metadata.
Idents(cells) <- "orig.ident"
cells <- RenameIdents(cells, `1` = "MG_S", `2` = "MG_P", `3` = "MG_L")
cells[["group"]] <- Idents(cells)
Idents(cells) <- "orig.ident"
cells <- RenameIdents(cells, `1` = "Saline", `2` = "Poly(i:c)", `3` = "LPS")
cells[["treat"]] <- Idents(cells)
cells[["id"]] <- "Reference"
cells[["tech"]] <- "10x"
We use the scater3 package for quality control (QC) of the
single cell datasets
cells <- DietSeurat(cells)
cells.sce <- as.SingleCellExperiment(cells)
cells.sce <- logNormCounts(cells.sce)
dim(cells.sce)
## [1] 31342 31712
per.cell <- perCellQCMetrics(cells.sce,
subsets = list(Mito = grep("^mt-", rownames(cells.sce)), # Mitochondrial genes
Ribo = grep("\\bRp[sl]\\d+[^k]*\\b", rownames(cells.sce)))) # Ribosomal genes
qc.stats <- quickPerCellQC(per.cell, percent_subsets = c("subsets_Mito_percent","subsets_Ribo_percent"))
qc.stats$high_subsets_Mito_percent <- per.cell$subsets_Mito_percent > 3
qc.stats$discard <- qc.stats$low_lib_size | qc.stats$low_n_features | qc.stats$high_subsets_Mito_percent | qc.stats$high_subsets_Ribo_percent
qc.stats$discard_size_complexity <- qc.stats$low_lib_size | qc.stats$low_n_features
colData(cells.sce) <- cbind(colData(cells.sce), cbind(per.cell, qc.stats))
colSums(as.matrix(qc.stats))
## low_lib_size low_n_features high_subsets_Mito_percent
## 2783 3346 5135
## high_subsets_Ribo_percent discard discard_size_complexity
## 610 7388 3392
cells.filtered <- cells.sce[,!qc.stats$discard]
keep_feature <- nexprs(cells.filtered, byrow=TRUE) > 0
## Remove mitochondrial and ribosomal genes which does not contribute to dimensionality reduction and clustering
keep_feature[grep("\\bRp[sl]\\d+[^k]*\\b", rownames(cells.sce))] <- FALSE
keep_feature[grep("mt-", rownames(cells.sce))] <- FALSE
cells.filtered <- cells.filtered[keep_feature,]
Back to Seurat object.
refdata <- as.Seurat(cells.filtered)
refdata[["nCount_originalexp"]] <- NULL; refdata[["nFeature_originalexp"]] <- NULL
refdata <- RenameAssays(refdata, originalexp = "RNA")
refdata$ident <- NULL
refdata <- refdata %>%
NormalizeData() %>%
FindVariableFeatures() %>%
ScaleData() %>%
RunPCA(npcs = 50)
## USE DIM=1:50 FOR UNFILTERED DATA, SKIP JACKSTRAW TO SAVE TIME
# refdata <- JackStraw(refdata, dims = 50)
# refdata <- ScoreJackStraw(refdata, dims = 1:50)
# JackStrawPlot(refdata, dims = 1:50)
# dims_use <- JS(refdata[["pca"]], slot = "overall") %>% as.data.frame %>% filter(Score < 0.05) %>% pull(PC)
dims_use <- 1:50
refdata <- refdata %>%
FindNeighbors(dims = dims_use) %>%
FindClusters(resolution = 0.8) %>%
RunUMAP(dims = dims_use)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 24324
## Number of edges: 936162
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8613
## Number of communities: 19
## Elapsed time: 4 seconds
Take a closer look at our pre-processed data:
We see that there are some major clusters in the middle and bottom-right
of the umap plot. These are microglia single cell populations. The rest
of the clusters represent non-microglia populations as well as droplets
contaminated by cell debris. These were captured in our data because we
applied Dounce homogenization and Percoll density gradient
centrifugation to isolate microglia from mouse brain, which can never
reach perfect purity.
Nevertheless, the microglia clusters look good, as they exhibit robust
Hexb expression, a well validate microglia marker4. We can
easily isolate them via in silico procedures, as is shown in
subsequent code chunks.
FeaturePlot(refdata, features = "Hexb") | FeaturePlot(refdata, features = "subsets_Mito_percent")
Some microglia single cells exhibit high mitochondrial RNA levels, purge them.
refdata_clean <- subset(refdata, subset = subsets_Mito_percent < 3)
To achieve in silico purification of microglia populations,
we use the CellSelector() utility provided by
Seurat package. This procedure requires manual selection of
cells and is difficult to replicate programmatically.
To help you reproduce the result, we have saved the identity of the
selected cells in ../miscs/mg_manual.rda in this repo. You
can load it directly into workspace to subset out the micorglia
population and proceed to downstream workflows.
## NOT RUN
# Idents(refdata_clean) <- "contaminants"
# refdata_clean <- CellSelector(FeaturePlot(refdata_clean, features = "Hexb"), object = refdata_clean, ident = "microglia") ## Run repeatedly, select desired clusters
# mg_manual <- WhichCells(refdata_clean, idents = "microglia")
# save(mg_manual, file = "../miscs/mg_manual.rda")
load(file = '../miscs/mg_manual.rda')
refdata_clean <- subset(refdata_clean, cells = mg_manual)
Re-run Seurat Standard workflow on subsetted data.
refdata_clean <- refdata_clean %>%
NormalizeData() %>%
FindVariableFeatures() %>%
ScaleData() %>%
RunPCA(npcs = 45) %>%
FindNeighbors(dims = 1:45) %>%
FindClusters() %>%
RunUMAP(dims = 1:45)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 20446
## Number of edges: 669450
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8204
## Number of communities: 11
## Elapsed time: 2 seconds
The metadata slot is updated based on the experiment design.
But note that in recent years the microglia community is gradually
quitting the use of Reactive and Homeostatic
nomenclature.
Idents(refdata_clean) <- "orig.ident"
refdata_clean <- RenameIdents(refdata_clean, `1` = "Homeostatic", `2` = "Reactive", `3` = "Reactive")
refdata_clean[["state"]] <- Idents(refdata_clean)
A general note on the UMAP visualizations:
Depending on the use of CPUs and versions of the umap
algorithm, the UMAP coordinates computed on different machines may not
entirely match what was shown in the original publication. This is a
known issue of
the umap algorithm. Nevertheless, the overall structure of
the UMAP embedding should be consistent, i.e. microglia from mouse
subjected to Saline, Poly(i:c) and LPS groups reside in different
clusters. We’ve tested this on different computing platforms and reached
the conclution that the shifts in UMAP coordinates does not affect our
biological interpretations.
as.data.frame(colData(cells.sce)[,c("nCount_RNA","nFeature_RNA","discard_size_complexity")]) %>%
ggplot(mapping = aes(x=nCount_RNA, y=nFeature_RNA)) +
geom_point(aes(color = discard_size_complexity), size = 1.2) +
# ggtitle("Reference Data (10x)") +
scale_color_manual(values = brewer.pal(n = 6,name = "Paired")[c(2,6)], labels = c("Kept","Removed"), name = "") +
scale_x_continuous(expand = c(0.15,0.1)) +
xlab("UMI counts") +
ylab("Gene counts") +
theme_classic() +
theme(
# plot.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 16),
axis.title = element_text(size = 21),
legend.title = element_blank(),
legend.box.background = element_blank(),
legend.position = c(.95, .05),
legend.justification = c("right", "bottom"),
legend.text = element_text(size = 16),
legend.margin = margin(1, 5, 5, 5)
)
as.data.frame(colData(cells.sce)[,c("treat","subsets_Mito_percent","high_subsets_Mito_percent")]) %>%
ggplot(mapping = aes(x=treat, y=subsets_Mito_percent)) +
geom_violin() +
geom_jitter(mapping = aes(color = high_subsets_Mito_percent), size = 0.3, height = 0, width = 0.1) +
scale_color_manual(values = brewer.pal(n = 6,name = "Paired")[c(2,6)]) +
xlab(NULL) +
ylab("Mitochondrial RNA %") +
theme_linedraw() +
theme(
legend.position = "none",
axis.text.y = element_text(size = 14),
axis.text.x = element_text(size = 16),
axis.title = element_text(size = 20),
)
as.data.frame(colData(cells.sce)[,c("treat","subsets_Ribo_percent","high_subsets_Ribo_percent")]) %>%
ggplot(mapping = aes(x=treat, y=subsets_Ribo_percent)) +
geom_violin() +
geom_jitter(mapping = aes(color = high_subsets_Ribo_percent), size = 0.3, height = 0, width = 0.1) +
scale_color_manual(values = brewer.pal(n = 6,name = "Paired")[c(2,6)]) +
xlab(NULL) +
ylab("Ribosomal RNA %") +
theme_linedraw() +
theme(
legend.position = "none",
axis.text.y = element_text(size = 14),
axis.text.x = element_text(size = 16),
axis.title = element_text(size = 20),
)
refdata_clean$treat <- factor(refdata_clean$treat, levels = c("Saline","Poly(i:c)","LPS"))
umap_reff1 <- FeaturePlot(refdata, features = "Hexb") +
ggtitle("Hexb") +
theme(
plot.title = element_text(hjust = 0, size = 24),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_text(size = 18),
axis.line = element_line(arrow = grid::arrow(length = unit(0.3, "cm"),
ends = "last", type = "closed")),
legend.title = element_blank(),
legend.text = element_text(size = 14),
legend.position = c(.05, .25)
)
umap_reff2 <- DimPlot(refdata, cells.highlight = mg_manual) +
NoLegend() +
ggtitle("Retained Cells") +
theme(
plot.title = element_text(hjust = 0, size = 24),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_text(size = 18),
axis.line = element_line(arrow = grid::arrow(length = unit(0.3, "cm"),
ends = "last", type = "closed"))
)
umap_reff3 <- DimPlot(refdata_clean, group.by = "treat") +
scale_color_manual(values = brewer.pal(n = 12, name = "Paired")[c(1,3,5)],
labels = c("Saline", "Poly(i:c)", "LPS")) +
ggtitle("Final Reference") +
guides(color = guide_legend(override.aes = list(size=3), byrow = TRUE)) +
theme(
plot.title = element_text(hjust = 0, size = 24),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_text(size = 18),
axis.line = element_line(arrow = grid::arrow(length = unit(0.3, "cm"),
ends = "last", type = "closed")),
legend.title = element_blank(),
legend.text = element_text(size = 12),
legend.position = c(.08, .22),
legend.spacing.y = unit(0.25, 'cm')
)
umap_reff1 | umap_reff2 | umap_reff3
Backup the processed Seurat object using h5 format.
## Avoid using factor while saving to other formats
refdata_clean$group <- as.character(refdata_clean$group)
refdata_clean$treat <- as.character(refdata_clean$treat)
refdata_clean$state <- as.character(refdata_clean$state)
refdata_clean$orig.ident <- as.character(refdata_clean$orig.ident)
refdata_clean$RNA_snn_res.0.8 <- as.integer(as.character(refdata_clean$RNA_snn_res.0.8))
refdata_clean$seurat_clusters <- as.integer(as.character(refdata_clean$seurat_clusters))
if(!dir.exists('../data/AAV_MG_scRNAseq')){dir.create('../data/AAV_MG_scRNAseq')}
SaveH5Seurat(refdata_clean, filename = '../data/AAV_MG_scRNAseq/refdata_clean.h5Seurat', overwrite = T, verbose = F)
Convert('../data/AAV_MG_scRNAseq/refdata_clean.h5Seurat', dest = "h5ad", overwrite = T, verbose = F)
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scales_1.2.0 RColorBrewer_1.1-3
## [3] forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.9 purrr_0.3.4
## [7] readr_2.1.2 tidyr_1.2.0
## [9] tibble_3.1.7 tidyverse_1.3.1
## [11] scater_1.18.6 ggplot2_3.3.6
## [13] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [15] Biobase_2.50.0 GenomicRanges_1.42.0
## [17] GenomeInfoDb_1.26.7 IRanges_2.24.1
## [19] S4Vectors_0.28.1 BiocGenerics_0.36.1
## [21] MatrixGenerics_1.9.0 matrixStats_0.62.0
## [23] SeuratDisk_0.0.0.9020 sp_1.5-0
## [25] SeuratObject_4.1.0 Seurat_4.1.1
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.25
## [3] tidyselect_1.1.2 htmlwidgets_1.5.4
## [5] grid_4.0.5 BiocParallel_1.24.1
## [7] Rtsne_0.16 munsell_0.5.0
## [9] codetools_0.2-18 ica_1.0-2
## [11] future_1.26.1 miniUI_0.1.1.1
## [13] withr_2.5.0 spatstat.random_2.2-0
## [15] colorspace_2.0-3 progressr_0.10.1
## [17] highr_0.9 knitr_1.39
## [19] rstudioapi_0.13 ROCR_1.0-11
## [21] tensor_1.5 listenv_0.8.0
## [23] labeling_0.4.2 GenomeInfoDbData_1.2.4
## [25] polyclip_1.10-0 farver_2.1.0
## [27] bit64_4.0.5 parallelly_1.32.0
## [29] vctrs_0.4.1 generics_0.1.2
## [31] xfun_0.31 R6_2.5.1
## [33] ggbeeswarm_0.6.0 rsvd_1.0.5
## [35] hdf5r_1.3.5 bitops_1.0-7
## [37] spatstat.utils_2.3-1 DelayedArray_0.16.3
## [39] assertthat_0.2.1 promises_1.2.0.1
## [41] rgeos_0.5-9 beeswarm_0.4.0
## [43] gtable_0.3.0 beachmat_2.6.4
## [45] globals_0.15.0 goftest_1.2-3
## [47] rlang_1.0.2 splines_4.0.5
## [49] lazyeval_0.2.2 spatstat.geom_2.4-0
## [51] broom_0.8.0 yaml_2.3.5
## [53] reshape2_1.4.4 abind_1.4-5
## [55] modelr_0.1.8 backports_1.4.1
## [57] httpuv_1.6.5 tools_4.0.5
## [59] ellipsis_0.3.2 spatstat.core_2.4-4
## [61] jquerylib_0.1.4 ggridges_0.5.3
## [63] Rcpp_1.0.8.3 plyr_1.8.7
## [65] sparseMatrixStats_1.7.0 zlibbioc_1.36.0
## [67] RCurl_1.98-1.7 rpart_4.1-15
## [69] deldir_1.0-6 pbapply_1.5-0
## [71] viridis_0.6.2 cowplot_1.1.1
## [73] zoo_1.8-10 haven_2.5.0
## [75] ggrepel_0.9.1 cluster_2.1.3
## [77] fs_1.5.2 magrittr_2.0.3
## [79] RSpectra_0.16-1 data.table_1.14.2
## [81] scattermore_0.8 lmtest_0.9-40
## [83] reprex_2.0.1 RANN_2.6.1
## [85] fitdistrplus_1.1-8 hms_1.1.1
## [87] patchwork_1.1.1 mime_0.12
## [89] evaluate_0.15 xtable_1.8-4
## [91] readxl_1.4.0 gridExtra_2.3
## [93] compiler_4.0.5 KernSmooth_2.23-20
## [95] crayon_1.5.1 htmltools_0.5.2
## [97] mgcv_1.8-40 later_1.3.0
## [99] tzdb_0.3.0 lubridate_1.8.0
## [101] DBI_1.1.2 dbplyr_2.2.0
## [103] MASS_7.3-57 Matrix_1.4-1
## [105] cli_3.3.0 igraph_1.3.2
## [107] pkgconfig_2.0.3 plotly_4.10.0
## [109] scuttle_1.0.4 spatstat.sparse_2.1-1
## [111] xml2_1.3.3 vipor_0.4.5
## [113] bslib_0.3.1 XVector_0.30.0
## [115] rvest_1.0.2 digest_0.6.29
## [117] sctransform_0.3.3 RcppAnnoy_0.0.19
## [119] spatstat.data_2.2-0 rmarkdown_2.14
## [121] cellranger_1.1.0 leiden_0.4.2
## [123] uwot_0.1.11 DelayedMatrixStats_1.12.3
## [125] shiny_1.7.1 lifecycle_1.0.1
## [127] nlme_3.1-158 jsonlite_1.8.0
## [129] BiocNeighbors_1.8.2 viridisLite_0.4.0
## [131] fansi_1.0.3 pillar_1.7.0
## [133] lattice_0.20-45 fastmap_1.1.0
## [135] httr_1.4.3 survival_3.3-1
## [137] glue_1.6.2 png_0.1-7
## [139] bit_4.0.4 stringi_1.7.6
## [141] sass_0.4.1 BiocSingular_1.6.0
## [143] irlba_2.3.5 future.apply_1.9.0
Bennett, M. L. et al. New tools for studying microglia in the mouse and human CNS. Proc Natl Acad Sci USA 113, E1738 (2016).↩︎
Zhang, S. et al. Single-cell transcriptomics identifies divergent developmental lineage trajectories during human pituitary development. Nat Commun 11, 5275 (2020).↩︎
McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics btw777 (2017) doi:10.1093/bioinformatics/btw777.↩︎
Masuda, T. et al. Novel Hexb-based tools for studying microglia in the CNS. Nat Immunol (2020) doi:10.1038/s41590-020-0707-4.↩︎