TL;DR

You may click on the following links to fast forward to the codes directly producing the Figures:

Back to index page.

About this Vignette

This vignette illustrates the steps by which the reference data (10X) was processed. It also reproduces some of the Supplementary Figures in the original publication involved with the pre-processing of reference data (Lin et al., 2022).
Here we only show how plain plots were generated directly from data. To reach the final illustrations in the publication, these plots may require further editing in graphic design softwares (i.e. Adobe Illustrator). Those beautification steps are not central to our analysis and therefore will be omitted.

The input file of this vignette is a filtered h5 file generated by CellRanger (GSE197743_AAV_MG_filtered_feature_bc_matrix.h5). The output is a processed .h5Seurat with metadata slot filled.

Overview of the experiments

Microglia were isolated from adult mice (~8w) using a method based on the “cold-mechanical dissociation” protocol (Bennett et al., 2016)1. The steps involving FACS sorting from the original protocol was omitted to minimize ex vivo activation of microglia.

We found that the single cell transcriptome acquired by 10X Chromium (3’ Chemistry v3) captured transcripts of the AAV-packaged transgene (mScarlet) at a very low rate (data not published). Therefore we carried out deep single cell sequencing using a modified protocol based on a customized Smart-seq2 protocol (Zhang et al., 2020)2 on microglia isolated from mice after stereotaxic AAV delivery. The Smart-seq2 based method captured mScarlet transcripts with substantially higher efficiently.

To characterize the microglia transcriptomes under quiescent and immune activation states, we isolated microglia from mice subjected to Saline, Poly(i:c) and LPS treatment (i.p.), and performed scRNA-seq using the 10X Chromium platform.

Analysis Outline

Upstream analysis pipelines (i.e. sequence alignment, generation of digital expression matrix, etc.):

  • For Smart-seq2, fastq files were processed by a custom Snakemake workflow to generate the expression table (feature-barcode-count matrix).
  • For 10X, fastq files from 3 mice were quantified by CellRanger (v6.0.1) using the cellranger count command and then aggregated using the cellranger aggr command.

Downstream analysis:

  • Preprocessing and QC (scater)
  • Seurat Standard Workflow (Seurat)
  • Label Transfer (Seurat)

Pre-processing

Load libraries and data

Load reuiqred libraries.

suppressPackageStartupMessages({
  library(Seurat)
  library(SeuratDisk)
  library(scater)
  library(tidyverse)
  library(RColorBrewer)
  library(scales)
})

If you haven’t done so, download data from GEO.

if(!file.exists('../data/GSE197743/GSE197743_AAV_MG_filtered_feature_bc_matrix.h5')){
  library(GEOquery)
  options(timeout = 6000)  ## In case of slow Internet connections
  getGEOSuppFiles(GEO = "GSE197743", makeDirectory = T, baseDir = '../data/')
}

Load the data matrix to workspace.

data <- Read10X_h5(filename = '../data/GSE197743/GSE197743_AAV_MG_filtered_feature_bc_matrix.h5')

Create Seurat object.

cells <- CreateSeuratObject(data, project = "AAV_Microglia", names.field = 2, names.delim = "-")

Update metadata.

Idents(cells) <- "orig.ident"
cells <- RenameIdents(cells, `1` = "MG_S", `2` = "MG_P", `3` = "MG_L")
cells[["group"]] <- Idents(cells)

Idents(cells) <- "orig.ident"
cells <- RenameIdents(cells, `1` = "Saline", `2` = "Poly(i:c)", `3` = "LPS")
cells[["treat"]] <- Idents(cells)

cells[["id"]] <- "Reference"
cells[["tech"]] <- "10x"

Quality Control (QC)

We use the scater3 package for quality control (QC) of the single cell datasets

cells <- DietSeurat(cells)
cells.sce <- as.SingleCellExperiment(cells)
cells.sce <- logNormCounts(cells.sce)
dim(cells.sce)
## [1] 31342 31712

Cell level QC

per.cell <- perCellQCMetrics(cells.sce,
                             subsets = list(Mito = grep("^mt-", rownames(cells.sce)),  # Mitochondrial genes
                                            Ribo = grep("\\bRp[sl]\\d+[^k]*\\b", rownames(cells.sce))))  # Ribosomal genes
qc.stats <- quickPerCellQC(per.cell, percent_subsets = c("subsets_Mito_percent","subsets_Ribo_percent"))

qc.stats$high_subsets_Mito_percent <- per.cell$subsets_Mito_percent > 3
qc.stats$discard <- qc.stats$low_lib_size | qc.stats$low_n_features | qc.stats$high_subsets_Mito_percent | qc.stats$high_subsets_Ribo_percent
qc.stats$discard_size_complexity <- qc.stats$low_lib_size | qc.stats$low_n_features
colData(cells.sce) <- cbind(colData(cells.sce), cbind(per.cell, qc.stats))
colSums(as.matrix(qc.stats))
##              low_lib_size            low_n_features high_subsets_Mito_percent 
##                      2783                      3346                      5135 
## high_subsets_Ribo_percent                   discard   discard_size_complexity 
##                       610                      7388                      3392
cells.filtered <- cells.sce[,!qc.stats$discard]

Feature level QC

keep_feature <- nexprs(cells.filtered, byrow=TRUE) > 0

## Remove mitochondrial and ribosomal genes which does not contribute to dimensionality reduction and clustering
keep_feature[grep("\\bRp[sl]\\d+[^k]*\\b", rownames(cells.sce))] <- FALSE
keep_feature[grep("mt-", rownames(cells.sce))] <- FALSE
cells.filtered <- cells.filtered[keep_feature,]

Back to Seurat object.

refdata <- as.Seurat(cells.filtered)
refdata[["nCount_originalexp"]] <- NULL; refdata[["nFeature_originalexp"]] <- NULL
refdata <- RenameAssays(refdata, originalexp = "RNA")
refdata$ident <- NULL

Standard Seurat Workflow

refdata <- refdata %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA(npcs = 50)

## USE DIM=1:50 FOR UNFILTERED DATA, SKIP JACKSTRAW TO SAVE TIME
# refdata <- JackStraw(refdata, dims = 50)
# refdata <- ScoreJackStraw(refdata, dims = 1:50)
# JackStrawPlot(refdata, dims = 1:50)
# dims_use <- JS(refdata[["pca"]], slot = "overall") %>% as.data.frame %>% filter(Score < 0.05) %>% pull(PC)

dims_use <- 1:50

refdata <- refdata %>%
  FindNeighbors(dims = dims_use) %>%
  FindClusters(resolution = 0.8) %>%
  RunUMAP(dims = dims_use)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 24324
## Number of edges: 936162
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8613
## Number of communities: 19
## Elapsed time: 4 seconds

Take a closer look at our pre-processed data:
We see that there are some major clusters in the middle and bottom-right of the umap plot. These are microglia single cell populations. The rest of the clusters represent non-microglia populations as well as droplets contaminated by cell debris. These were captured in our data because we applied Dounce homogenization and Percoll density gradient centrifugation to isolate microglia from mouse brain, which can never reach perfect purity.
Nevertheless, the microglia clusters look good, as they exhibit robust Hexb expression, a well validate microglia marker4. We can easily isolate them via in silico procedures, as is shown in subsequent code chunks.

FeaturePlot(refdata, features = "Hexb") | FeaturePlot(refdata, features = "subsets_Mito_percent")

Some microglia single cells exhibit high mitochondrial RNA levels, purge them.

refdata_clean <- subset(refdata, subset = subsets_Mito_percent < 3)

To achieve in silico purification of microglia populations, we use the CellSelector() utility provided by Seurat package. This procedure requires manual selection of cells and is difficult to replicate programmatically.
To help you reproduce the result, we have saved the identity of the selected cells in ../miscs/mg_manual.rda in this repo. You can load it directly into workspace to subset out the micorglia population and proceed to downstream workflows.

## NOT RUN
# Idents(refdata_clean) <- "contaminants"
# refdata_clean <- CellSelector(FeaturePlot(refdata_clean, features = "Hexb"), object = refdata_clean, ident = "microglia")  ## Run repeatedly, select desired clusters
# mg_manual <- WhichCells(refdata_clean, idents = "microglia")
# save(mg_manual, file = "../miscs/mg_manual.rda")
load(file = '../miscs/mg_manual.rda')
refdata_clean <- subset(refdata_clean, cells = mg_manual)

Re-run Seurat Standard workflow on subsetted data.

refdata_clean <- refdata_clean %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA(npcs = 45) %>%
  FindNeighbors(dims = 1:45) %>%
  FindClusters() %>%
  RunUMAP(dims = 1:45)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 20446
## Number of edges: 669450
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8204
## Number of communities: 11
## Elapsed time: 2 seconds

Update Metadata

The metadata slot is updated based on the experiment design.
But note that in recent years the microglia community is gradually quitting the use of Reactive and Homeostatic nomenclature.

Idents(refdata_clean) <- "orig.ident"
refdata_clean <- RenameIdents(refdata_clean, `1` = "Homeostatic", `2` = "Reactive", `3` = "Reactive")
refdata_clean[["state"]] <- Idents(refdata_clean)

Reproducing Plots

A general note on the UMAP visualizations:
Depending on the use of CPUs and versions of the umap algorithm, the UMAP coordinates computed on different machines may not entirely match what was shown in the original publication. This is a known issue of the umap algorithm. Nevertheless, the overall structure of the UMAP embedding should be consistent, i.e. microglia from mouse subjected to Saline, Poly(i:c) and LPS groups reside in different clusters. We’ve tested this on different computing platforms and reached the conclution that the shifts in UMAP coordinates does not affect our biological interpretations.

Extended Data Figure 7a

as.data.frame(colData(cells.sce)[,c("nCount_RNA","nFeature_RNA","discard_size_complexity")]) %>%
  ggplot(mapping = aes(x=nCount_RNA, y=nFeature_RNA)) +
  geom_point(aes(color = discard_size_complexity), size = 1.2) +
  # ggtitle("Reference Data (10x)") +
  scale_color_manual(values = brewer.pal(n = 6,name = "Paired")[c(2,6)], labels = c("Kept","Removed"), name = "") +
  scale_x_continuous(expand = c(0.15,0.1)) +
  xlab("UMI counts") +
  ylab("Gene counts") +
  theme_classic() +
  theme(
    # plot.title = element_text(size = 12, face = "bold"),
    axis.text = element_text(size = 16),
    axis.title = element_text(size = 21),
    legend.title = element_blank(),
    legend.box.background = element_blank(),
    legend.position = c(.95, .05),
    legend.justification = c("right", "bottom"),
    legend.text = element_text(size = 16),
    legend.margin = margin(1, 5, 5, 5)
    )

Extended Data Figure 7b

as.data.frame(colData(cells.sce)[,c("treat","subsets_Mito_percent","high_subsets_Mito_percent")]) %>%
  ggplot(mapping = aes(x=treat, y=subsets_Mito_percent)) +
  geom_violin() +
  geom_jitter(mapping = aes(color = high_subsets_Mito_percent), size = 0.3, height = 0, width = 0.1) +
  scale_color_manual(values = brewer.pal(n = 6,name = "Paired")[c(2,6)]) +
  xlab(NULL) +
  ylab("Mitochondrial RNA %") +
  theme_linedraw() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(size = 14),
    axis.text.x = element_text(size = 16),
    axis.title = element_text(size = 20),
    )

Extended Data Figure 7c

as.data.frame(colData(cells.sce)[,c("treat","subsets_Ribo_percent","high_subsets_Ribo_percent")]) %>%
  ggplot(mapping = aes(x=treat, y=subsets_Ribo_percent)) +
  geom_violin() +
  geom_jitter(mapping = aes(color = high_subsets_Ribo_percent), size = 0.3, height = 0, width = 0.1) +
  scale_color_manual(values = brewer.pal(n = 6,name = "Paired")[c(2,6)]) +
  xlab(NULL) +
  ylab("Ribosomal RNA %") +
  theme_linedraw() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(size = 14),
    axis.text.x = element_text(size = 16),
    axis.title = element_text(size = 20),
    )

Extended Data Figure 7d

refdata_clean$treat <- factor(refdata_clean$treat, levels = c("Saline","Poly(i:c)","LPS"))
umap_reff1 <- FeaturePlot(refdata, features = "Hexb") +
  ggtitle("Hexb") +
  theme(
    plot.title = element_text(hjust = 0, size = 24),
    axis.ticks = element_blank(),
    axis.text = element_blank(),
    axis.title = element_text(size = 18),
    axis.line = element_line(arrow = grid::arrow(length = unit(0.3, "cm"), 
                                                       ends = "last", type = "closed")),
    legend.title = element_blank(),
    legend.text = element_text(size = 14),
    legend.position = c(.05, .25)
  )
umap_reff2 <- DimPlot(refdata, cells.highlight = mg_manual) + 
  NoLegend() +
  ggtitle("Retained Cells") +
  theme(
    plot.title = element_text(hjust = 0, size = 24),
    axis.ticks = element_blank(),
    axis.text = element_blank(),
    axis.title = element_text(size = 18),
    axis.line = element_line(arrow = grid::arrow(length = unit(0.3, "cm"), 
                                                 ends = "last", type = "closed"))
  )
umap_reff3 <- DimPlot(refdata_clean, group.by = "treat") +
  scale_color_manual(values = brewer.pal(n = 12, name = "Paired")[c(1,3,5)], 
                     labels = c("Saline", "Poly(i:c)", "LPS")) +
  ggtitle("Final Reference") +
  guides(color = guide_legend(override.aes = list(size=3), byrow = TRUE)) +
  theme(
    plot.title = element_text(hjust = 0, size = 24),
    axis.ticks = element_blank(),
    axis.text = element_blank(),
    axis.title = element_text(size = 18),
    axis.line = element_line(arrow = grid::arrow(length = unit(0.3, "cm"), 
                                                 ends = "last", type = "closed")),
    legend.title = element_blank(),
    legend.text = element_text(size = 12),
    legend.position = c(.08, .22),
    legend.spacing.y = unit(0.25, 'cm')
  )
umap_reff1 | umap_reff2 | umap_reff3

Save Results

Backup the processed Seurat object using h5 format.

## Avoid using factor while saving to other formats
refdata_clean$group <- as.character(refdata_clean$group)
refdata_clean$treat <- as.character(refdata_clean$treat)
refdata_clean$state <- as.character(refdata_clean$state)
refdata_clean$orig.ident <- as.character(refdata_clean$orig.ident)
refdata_clean$RNA_snn_res.0.8 <- as.integer(as.character(refdata_clean$RNA_snn_res.0.8))
refdata_clean$seurat_clusters <- as.integer(as.character(refdata_clean$seurat_clusters))

if(!dir.exists('../data/AAV_MG_scRNAseq')){dir.create('../data/AAV_MG_scRNAseq')}
SaveH5Seurat(refdata_clean, filename = '../data/AAV_MG_scRNAseq/refdata_clean.h5Seurat', overwrite = T, verbose = F)
Convert('../data/AAV_MG_scRNAseq/refdata_clean.h5Seurat', dest = "h5ad", overwrite = T, verbose = F)

Session info

sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scales_1.2.0                RColorBrewer_1.1-3         
##  [3] forcats_0.5.1               stringr_1.4.0              
##  [5] dplyr_1.0.9                 purrr_0.3.4                
##  [7] readr_2.1.2                 tidyr_1.2.0                
##  [9] tibble_3.1.7                tidyverse_1.3.1            
## [11] scater_1.18.6               ggplot2_3.3.6              
## [13] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [15] Biobase_2.50.0              GenomicRanges_1.42.0       
## [17] GenomeInfoDb_1.26.7         IRanges_2.24.1             
## [19] S4Vectors_0.28.1            BiocGenerics_0.36.1        
## [21] MatrixGenerics_1.9.0        matrixStats_0.62.0         
## [23] SeuratDisk_0.0.0.9020       sp_1.5-0                   
## [25] SeuratObject_4.1.0          Seurat_4.1.1               
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                reticulate_1.25          
##   [3] tidyselect_1.1.2          htmlwidgets_1.5.4        
##   [5] grid_4.0.5                BiocParallel_1.24.1      
##   [7] Rtsne_0.16                munsell_0.5.0            
##   [9] codetools_0.2-18          ica_1.0-2                
##  [11] future_1.26.1             miniUI_0.1.1.1           
##  [13] withr_2.5.0               spatstat.random_2.2-0    
##  [15] colorspace_2.0-3          progressr_0.10.1         
##  [17] highr_0.9                 knitr_1.39               
##  [19] rstudioapi_0.13           ROCR_1.0-11              
##  [21] tensor_1.5                listenv_0.8.0            
##  [23] labeling_0.4.2            GenomeInfoDbData_1.2.4   
##  [25] polyclip_1.10-0           farver_2.1.0             
##  [27] bit64_4.0.5               parallelly_1.32.0        
##  [29] vctrs_0.4.1               generics_0.1.2           
##  [31] xfun_0.31                 R6_2.5.1                 
##  [33] ggbeeswarm_0.6.0          rsvd_1.0.5               
##  [35] hdf5r_1.3.5               bitops_1.0-7             
##  [37] spatstat.utils_2.3-1      DelayedArray_0.16.3      
##  [39] assertthat_0.2.1          promises_1.2.0.1         
##  [41] rgeos_0.5-9               beeswarm_0.4.0           
##  [43] gtable_0.3.0              beachmat_2.6.4           
##  [45] globals_0.15.0            goftest_1.2-3            
##  [47] rlang_1.0.2               splines_4.0.5            
##  [49] lazyeval_0.2.2            spatstat.geom_2.4-0      
##  [51] broom_0.8.0               yaml_2.3.5               
##  [53] reshape2_1.4.4            abind_1.4-5              
##  [55] modelr_0.1.8              backports_1.4.1          
##  [57] httpuv_1.6.5              tools_4.0.5              
##  [59] ellipsis_0.3.2            spatstat.core_2.4-4      
##  [61] jquerylib_0.1.4           ggridges_0.5.3           
##  [63] Rcpp_1.0.8.3              plyr_1.8.7               
##  [65] sparseMatrixStats_1.7.0   zlibbioc_1.36.0          
##  [67] RCurl_1.98-1.7            rpart_4.1-15             
##  [69] deldir_1.0-6              pbapply_1.5-0            
##  [71] viridis_0.6.2             cowplot_1.1.1            
##  [73] zoo_1.8-10                haven_2.5.0              
##  [75] ggrepel_0.9.1             cluster_2.1.3            
##  [77] fs_1.5.2                  magrittr_2.0.3           
##  [79] RSpectra_0.16-1           data.table_1.14.2        
##  [81] scattermore_0.8           lmtest_0.9-40            
##  [83] reprex_2.0.1              RANN_2.6.1               
##  [85] fitdistrplus_1.1-8        hms_1.1.1                
##  [87] patchwork_1.1.1           mime_0.12                
##  [89] evaluate_0.15             xtable_1.8-4             
##  [91] readxl_1.4.0              gridExtra_2.3            
##  [93] compiler_4.0.5            KernSmooth_2.23-20       
##  [95] crayon_1.5.1              htmltools_0.5.2          
##  [97] mgcv_1.8-40               later_1.3.0              
##  [99] tzdb_0.3.0                lubridate_1.8.0          
## [101] DBI_1.1.2                 dbplyr_2.2.0             
## [103] MASS_7.3-57               Matrix_1.4-1             
## [105] cli_3.3.0                 igraph_1.3.2             
## [107] pkgconfig_2.0.3           plotly_4.10.0            
## [109] scuttle_1.0.4             spatstat.sparse_2.1-1    
## [111] xml2_1.3.3                vipor_0.4.5              
## [113] bslib_0.3.1               XVector_0.30.0           
## [115] rvest_1.0.2               digest_0.6.29            
## [117] sctransform_0.3.3         RcppAnnoy_0.0.19         
## [119] spatstat.data_2.2-0       rmarkdown_2.14           
## [121] cellranger_1.1.0          leiden_0.4.2             
## [123] uwot_0.1.11               DelayedMatrixStats_1.12.3
## [125] shiny_1.7.1               lifecycle_1.0.1          
## [127] nlme_3.1-158              jsonlite_1.8.0           
## [129] BiocNeighbors_1.8.2       viridisLite_0.4.0        
## [131] fansi_1.0.3               pillar_1.7.0             
## [133] lattice_0.20-45           fastmap_1.1.0            
## [135] httr_1.4.3                survival_3.3-1           
## [137] glue_1.6.2                png_0.1-7                
## [139] bit_4.0.4                 stringi_1.7.6            
## [141] sass_0.4.1                BiocSingular_1.6.0       
## [143] irlba_2.3.5               future.apply_1.9.0

  1. Bennett, M. L. et al. New tools for studying microglia in the mouse and human CNS. Proc Natl Acad Sci USA 113, E1738 (2016).↩︎

  2. Zhang, S. et al. Single-cell transcriptomics identifies divergent developmental lineage trajectories during human pituitary development. Nat Commun 11, 5275 (2020).↩︎

  3. McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics btw777 (2017) doi:10.1093/bioinformatics/btw777.↩︎

  4. Masuda, T. et al. Novel Hexb-based tools for studying microglia in the CNS. Nat Immunol (2020) doi:10.1038/s41590-020-0707-4.↩︎