You may click on the following links to fast forward to the codes directly producing the Figures:
Back to index page.
This vignette reproduces analyses related to bulk RNAseq in the original publication (Lin et al., 2022).
Microglia were isolated from neonatal mouse and cultured (in vitro) in TIC medium.
Two batches of experiments were performed.
The first batch of experiment aims to study the differences in microglia
transcriptomes under different conditions. Microglia were treated with
LPS, IL4, or AAV-cap18 to induce activation, alternative activation, or
AAV transduction states. Three replicates were performed for each
condition.
The second batch aims to investigate whether priming microglia with
various factors introduce changes of the transcriptional states of
microglia after AAV transduction.
Load required libraries.
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(org.Mm.eg.db)
library(DESeq2)
library(ggplot2)
library(tidyverse)
library(RColorBrewer)
library(pheatmap)
})
If you haven’t done so, download data from GEO.
if(!file.exists('../data/GSE197743/GSE197743_AAV_MG_RNAseq.tsv.gz')){
library(GEOquery)
options(timeout = 6000) ## In case of slow Internet connections
getGEOSuppFiles(GEO = "GSE197743", makeDirectory = T, baseDir = '../data/')
}
Load the data table and associated metadata to workspace.
counts = as.matrix(read.table(file = gzfile("../data/GSE197743/GSE197743_AAV_MG_RNAseq.tsv.gz"), sep = "\t"))
coldata = read.csv(file = "../miscs/RNAseq_sample_table.csv", header = T)
# To avoid warnings at creating dds, recode factor levels to replace "/" by "_" or ".", and convert integer type to factor
coldata <- coldata %>%
mutate(
stimuli = recode(stimuli, `LPS_200ng/mL` = "LPS_200ng_mL", `IL4_20ng/mL` = "IL4_20ng_mL"),
virus = recode(virus,
`AAV-cap18-msc, moi=50,000` = "cap18_mSc_moi_50000")
) %>%
mutate(
batch = factor(batch),
stimuli = factor(stimuli),
virus = factor(virus)
)
coldata
## SampleName TIC stimuli virus batch lib_type
## 1 yt0616.1 trt blank untrt 1 pe
## 2 yt0616.2 trt blank untrt 1 pe
## 3 yt0616.3 trt blank untrt 1 pe
## 4 yt0616.4 trt LPS_200ng_mL untrt 1 pe
## 5 yt0616.5 trt LPS_200ng_mL untrt 1 pe
## 6 yt0616.6 trt LPS_200ng_mL untrt 1 pe
## 7 yt0616.7 trt IL4_20ng_mL untrt 1 pe
## 8 yt0616.8 trt IL4_20ng_mL untrt 1 pe
## 9 yt0616.9 trt IL4_20ng_mL untrt 1 pe
## 10 yt0616.16 trt blank cap18_mSc_moi_50000 1 pe
## 11 yt0616.17 trt blank cap18_mSc_moi_50000 1 pe
## 12 yt0616.18 trt blank cap18_mSc_moi_50000 1 pe
## 13 YT.1111.01 trt IL4_20ng_mL cap18_mSc_moi_50000 2 se
## 14 YT.1111.02 trt IL4_20ng_mL cap18_mSc_moi_50000 2 se
## 15 YT.1111.03 trt IL4_20ng_mL cap18_mSc_moi_50000 2 se
## 16 YT.1111.04 trt LPS_200ng_mL cap18_mSc_moi_50000 2 se
## 17 YT.1111.05 trt LPS_200ng_mL cap18_mSc_moi_50000 2 se
## 18 YT.1111.06 trt LPS_200ng_mL cap18_mSc_moi_50000 2 se
## 19 YT.1111.07 trt IL4_20ng_mL untrt 2 se
## 20 YT.1111.08 trt IL4_20ng_mL untrt 2 se
## 21 YT.1111.09 trt IL4_20ng_mL untrt 2 se
## 22 YT.1111.12 trt LPS_200ng_mL untrt 2 se
Note that this matrix contains non-integer values, which are
summarized counts from transcript level quantification by
tximeta. Since we’re not working with
tximeta here , we need to perform an additional step of
approximation to round the values to integers.
If you’d like to work with non-approximated version of the data, you
would have to download the raw fastq files and perform the analysis from
there. Refer to vignettes/rnaseq_raw.Rmd for that
version of the pipeline.
gse <- SummarizedExperiment(assays = round(counts), colData = coldata)
We first focus on analyzing the first experiment. Subset the data and create a DESeq2 object.
gse_1 <- gse[,1:12]
dds <- DESeqDataSet(gse_1, design = ~ stimuli + virus)
Manually retrieve rowData from org.Mm.eg.db.
gene_name <- mapIds(x = org.Mm.eg.db, keys = rownames(dds), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")
ensembl_id <- rownames(dds)
mcols(dds) <- DataFrame(mcols(dds), gene_name, ensembl_id)
Pre-filter the genes.
nrow(dds)
## [1] 35896
keep <- rowSums(counts(dds) >= 10) >= 3 # at least 3 samples with a count of 10 or higher
dds <- dds[keep,]
nrow(dds)
## [1] 13668
Reset the orders of factor levels.
dds$stimuli <- factor(dds$stimuli, levels = c("blank","IL4_20ng_mL","LPS_200ng_mL"))
dds$virus <- factor(dds$virus, levels = c("untrt","cap18_mSc_moi_50000"))
For exploratory analysis, it is common practice to work on the transformed count data. We choose to use variance stablizing transformation here for its robustness and speed.
vsd <- vst(dds, blind = FALSE)
Sample distance provides a powerful and intuitive way to explore the
states of the transcriptomes across samples.
To perform this analysis, we compute a sample distance matrix by the
dist() function using the euclidean distance as metric.
sampleDists <- dist(t(assay(vsd)))
Now plot the distance matrix with pheatmap.
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste0(sub("_.*","",vsd$stimuli), " - ", sub("_.*","",vsd$virus))
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
out <- pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
show_colnames = F,
col = colors)
# Store results from hierachical clustering
clusters <- cutree(out$tree_col, k = 3)
vsd$clusters <- as.factor(clusters)
dds$clusters <- as.factor(clusters)
From this plot we see that triggers such as LPS or IL4 promoted distinct states in the transcriptional states of microglia. These were conventionally described as “Reactive” (LPS) or “Alternative activation” (IL4) states. On the other hand, while microglia underwent no treatment and microglia transduced with AAV-cap18 demonstrated a clear within-group homogeneity, they were characterized by a similar transcriptional state, as illustrated by the hierachical clustering results.
Pricipal component analysis provides another way of summarizing the microglia transcription states under different conditions.
pcaData <- plotPCA(vsd, intgroup = c("SampleName","clusters","stimuli","virus"), returnData = TRUE)
pcaData$stimulus_virus <- factor(paste(pcaData$stimuli, pcaData$virus, sep = "."),
levels = c("blank.untrt","blank.cap18_mSc_moi_50000","IL4_20ng_mL.untrt","LPS_200ng_mL.untrt"))
percentVar <- round(100* attr(pcaData, "percentVar"))
ggplot(pcaData, aes(x = PC1, y = PC2)) +
geom_point(aes(color = clusters, shape = stimuli, fill = stimulus_virus), size = 2, stroke = 1, position = position_dodge2(width = 0.8)) +
scale_shape_manual(values = c(21,22,24)) +
scale_fill_manual(values=c("white",brewer.pal(n = 12, name = "Paired")[c(5,1,3)])) +
guides(color = "none",
fill = guide_legend(override.aes = list(shape = 21))) + # Hack: github.com/tidyverse/ggplot2/issues/2322
stat_ellipse(data=pcaData, mapping=aes(color = clusters), linetype = 2) +
xlim(-27,37) + ylim(-18,16) +
xlab(paste0("PC1: ", percentVar[1], "% variance")) +
ylab(paste0("PC2: ", percentVar[2], "% variance")) +
coord_fixed() +
theme_bw() +
theme(
legend.position = "none"
)
Using clusters obtained from sample distance matrix, we use the
stat_ellipse function to draw a 95% confidence level for a
multivariate t-distribution of the PC coordinates.
To have a closer look at the microglia transcription states under different conditions, we use heatmap to plot the relative expression of canonical homeostatic and reactive marker genes.
genesubsets_to_plot <- rowData(vsd)[rowData(vsd)$gene_name %in% c("Cst3","Cxcl10","Il12b","Ccl2","Il1b","Nos2","Tnf","Il6","Il1a","Clec10a","Clec7a","Chil3","Mrc1"),]$ensembl_id
names(genesubsets_to_plot) <- rowData(vsd)[rowData(vsd)$gene_name %in% c("Cst3","Cxcl10","Il12b","Ccl2","Il1b","Nos2","Tnf","Il6","Il1a","Clec10a","Clec7a","Chil3","Mrc1"),]$gene_name ## This keeps the order right
sample_order <- c("yt0616.1","yt0616.2","yt0616.3",
"yt0616.16","yt0616.17","yt0616.18",
"yt0616.4","yt0616.5","yt0616.6",
"yt0616.7","yt0616.8","yt0616.9"
)
pheatmap(
assay(vsd)[genesubsets_to_plot,sample_order],
cluster_rows = TRUE,
show_rownames = TRUE,
cluster_cols = FALSE,
labels_row = names(genesubsets_to_plot),
labels_col = colnames(vsd),
scale = "row",
angle_col = 90
)
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] pheatmap_1.0.12 RColorBrewer_1.1-3
## [3] forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.9 purrr_0.3.4
## [7] readr_2.1.2 tidyr_1.2.0
## [9] tibble_3.1.7 tidyverse_1.3.1
## [11] ggplot2_3.3.6 DESeq2_1.30.1
## [13] org.Mm.eg.db_3.12.0 AnnotationDbi_1.52.0
## [15] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [17] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
## [19] IRanges_2.24.1 S4Vectors_0.28.1
## [21] BiocGenerics_0.36.1 MatrixGenerics_1.9.0
## [23] matrixStats_0.62.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.5.2 bitops_1.0-7 lubridate_1.8.0
## [4] bit64_4.0.5 httr_1.4.3 tools_4.0.5
## [7] backports_1.4.1 bslib_0.3.1 utf8_1.2.2
## [10] R6_2.5.1 DBI_1.1.2 colorspace_2.0-3
## [13] withr_2.5.0 tidyselect_1.1.2 bit_4.0.4
## [16] compiler_4.0.5 rvest_1.0.2 cli_3.3.0
## [19] xml2_1.3.3 DelayedArray_0.16.3 labeling_0.4.2
## [22] sass_0.4.1 scales_1.2.0 genefilter_1.72.1
## [25] digest_0.6.29 rmarkdown_2.14 XVector_0.30.0
## [28] pkgconfig_2.0.3 htmltools_0.5.2 highr_0.9
## [31] dbplyr_2.2.0 fastmap_1.1.0 rlang_1.0.2
## [34] readxl_1.4.0 rstudioapi_0.13 RSQLite_2.2.14
## [37] farver_2.1.0 jquerylib_0.1.4 generics_0.1.2
## [40] jsonlite_1.8.0 BiocParallel_1.24.1 RCurl_1.98-1.7
## [43] magrittr_2.0.3 GenomeInfoDbData_1.2.4 Matrix_1.4-1
## [46] Rcpp_1.0.8.3 munsell_0.5.0 fansi_1.0.3
## [49] lifecycle_1.0.1 stringi_1.7.6 yaml_2.3.5
## [52] MASS_7.3-57 zlibbioc_1.36.0 grid_4.0.5
## [55] blob_1.2.3 crayon_1.5.1 lattice_0.20-45
## [58] haven_2.5.0 splines_4.0.5 annotate_1.68.0
## [61] hms_1.1.1 locfit_1.5-9.4 knitr_1.39
## [64] pillar_1.7.0 geneplotter_1.68.0 reprex_2.0.1
## [67] XML_3.99-0.10 glue_1.6.2 evaluate_0.15
## [70] modelr_0.1.8 vctrs_0.4.1 tzdb_0.3.0
## [73] cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
## [76] cachem_1.0.6 xfun_0.31 xtable_1.8-4
## [79] broom_0.8.0 survival_3.3-1 memoise_2.0.1
## [82] ellipsis_0.3.2
The data submitted to GEO is the preprocessed table, therefore the gene-level summarization step was omitted from this vignette.↩︎