This is an analysis of the yeast data with 48 biological replicates in each of two conditions (analyzed in this publication). We chose this experiment because of the large number of biological replicates, which will allow us to (1) implement null comparisons on random subsets of samples within one condition, and (2) start with a null comparison and add in artificial differences to a subset of genes to define ‘true positives’.
We will simulate a strongly informative and weakly informative independent covariate. We will also investigate the effects of an uninformative covariate by using a randomly generated covariate.
Processed count table is made available by the authors in their paper codebase GitHub repository.
In this Rmd we will carry out simulations for multiple replicates, and plot results averaging over the replications. Here we draw effect sizes (log2 fold changes) from the observed fold changes in the full comparison of the two conditions. Since this results in a distribution of effect sizes and test statistics under the alternative that has a mode at zero, the assumptions for FDRreg-empirical are violated. However, since this assumption is impossible to check outside of a simulation setting, we still include this method in the benchmark comparisons.
In this setting, we assume 2000 DE genes and a unimodal alternative.
# Load packages and source benchmark FDR
library(SummarizedBenchmark)
library(data.table)
library(tidyr)
library(dplyr)
library(readr)
library(ggplot2)
library(magrittr)
library(cowplot)
library(purrr)
library(DESeq2)
library(tibble)
library(ggthemes)
library(R.utils)
## load helper functions
for (f in list.files("../R", "\\.(r|R)$", full.names = TRUE)) {
source(f)
}
# set up data / results directories
datdir <- "yeast-data"
resdir <- "../../results/RNAseq"
dir.create(datdir, showWarnings = FALSE)
dir.create(resdir, showWarnings = FALSE)
# results files that will be generated
# null
resfile_n5 <- file.path(resdir, "yeast-results-null5.rds")
resfile_n10 <- file.path(resdir, "yeast-results-null10.rds")
# strong covariate
resfile_d5 <- file.path(resdir, "yeast-results-de5.rds")
resfile_d10 <- file.path(resdir, "yeast-results-de10.rds")
# weak covariate
resfile_d5_w <- file.path(resdir, "yeastW-results-de5.rds")
resfile_d10_w <- file.path(resdir, "yeastW-results-de10.rds")
# uninformative covariate
resfile_d5_uninfCov <- file.path(resdir, "yeast-results-de5-uninfCov.rds")
resfile_d10_uninfCov <- file.path(resdir, "yeast-results-de10-uninfCov.rds")
# set up parallel backend
library(parallel)
nCores <- 20
First, we download the processed count data from GitHub. There is one file for the Snf2 condition and one file for the wild type condition. The Snf2 condition is a yeast strain that has the transcriptional regulator gene Snf2 knocked out. Each of these files is a compressed tar.gz
archive that contains a single bam file for each replicate in each condition, which we’ll place in a subdirectory called yeast-data.
download.file(url = "https://github.com/bartongroup/profDGE48/raw/master/Preprocessed_data/Snf2_countdata.tar.gz",
destfile = file.path(datdir, "Snf2_countdata.tar.gz"))
download.file(url = "https://github.com/bartongroup/profDGE48/raw/master/Preprocessed_data/WT_countdata.tar.gz",
destfile = file.path(datdir, "WT_countdata.tar.gz"))
gunzip(file.path(datdir, "Snf2_countdata.tar.gz"))
gunzip(file.path(datdir, "WT_countdata.tar.gz"))
untar(file.path(datdir, "Snf2_countdata.tar"), exdir = datdir)
untar(file.path(datdir, "WT_countdata.tar"), exdir = datdir)
file.remove(file.path(datdir, "Snf2_countdata.tar"),
file.path(datdir, "WT_countdata.tar"))
## [1] TRUE TRUE
Each of the data files contains two columns, one with a gene/feature name, and one with the count value.
We’ll also download the list of ‘bad’ replicates which were especially poorly correlated to the others, as determined by the authors.
download.file(url = "https://github.com/bartongroup/profDGE48/raw/master/Bad_replicate_identification/exclude.lst", destfile = (file.path(datdir, "badreps.txt")))
Here we make use of the map and map2 functions in the purrr
package, to swiftly apply the read_tsv
function from readr
to read in all of the 96 sample tables, as well as add in the sample name (derived from the file name) to each subtable. Finally, the reduce
function is used to join all the replicates together. We will remove samples that failed QC in the original study.
files <- dir(path = datdir, pattern = "*.bam.gbgout", full.names = TRUE)
sample_names <- sapply(strsplit(dir(path = datdir, pattern = "*.bam.gbgout"), "_MID"),
function(x) x[[1]])
badreps <- read_tsv(file.path(datdir, "badreps.txt"), col_names = FALSE)$X1
badreps <- unlist(lapply(strsplit(badreps, "_MID"), function(x) x[1]))
counts <- files %>%
purrr::map(read_tsv, col_names = FALSE) %>% # read in all the files individually
purrr::map2(sample_names, ~ dplyr:::rename(.x, !! .y := X2, feature = X1) ) %>% # add sample names
purrr::reduce(left_join, by = "feature") %>% # reduce with rbind into one dataframe
dplyr::select(-badreps ) # remove badreps
Here we’ll carry out an analysis of a subset of the full dataset, comparing the controls to the knockout samples. The fold changes observed in this comparison will be used when generating the non-null simulated data in the following sections. Here we use a subset of the samples (10 samples in each group) to mimic the sample sizes we use in simulation.
We’re ready to construct a DESeq2 object. First we pull out the feature names and add them as rownames for the count table, and next we construct a column data object that houses the sample names, replicate numbers, and condition factor (WT versus Snf2 knockout).
First we set up the DESeq2 object.
feats <- (counts %>% select(1))$feature
counts <- as.matrix(counts %>% select(-1))
rownames(counts) <- feats
coldat <- tibble(sample=colnames(counts)) %>%
separate(sample, sep="_", into=c("condition", "replicate"), remove=FALSE) %>%
mutate(condition = factor(condition))
# filter low count genes
counts <- counts[rowMeans(counts) > 1,]
dds_full <- DESeqDataSetFromMatrix(countData = counts,
colData = coldat,
design= ~ condition)
Next we run DESeq2, and subset results on genes with FDR < 0.05.
# results on full set
set.seed(9384)
subset.wt <- dds_full[,colData(dds_full)$condition == "WT"]
subset.sn <- dds_full[,colData(dds_full)$condition == "Snf2"]
subset <- cbind(subset.wt[, sample(1:ncol(subset.wt), 10)],
subset.sn[, sample(1:ncol(subset.sn), 10)])
dds <- DESeq(subset)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 8 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
resultsNames(dds) # lists the coefficients
## [1] "Intercept" "condition_WT_vs_Snf2"
res_10 <- results(dds, name="condition_WT_vs_Snf2", independentFiltering = F)
head(res_10)
## log2 fold change (MLE): condition WT vs Snf2
## Wald test p-value: condition WT vs Snf2
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## 15S_rRNA 23.0677113098 -0.442436354738558 0.737886531818313
## 21S_rRNA 122.468643375999 -0.0695116374758201 0.623139117792482
## HRA1 2.73615294778566 1.01339771844163 0.509391516531703
## ICR1 138.471320625508 -0.096978186962606 0.0756931110689155
## LSR1 236.839720373841 0.169449748770875 0.354617686257335
## NME1 27.707146784895 -0.288556885396075 0.292234960281767
## stat pvalue padj
## <numeric> <numeric> <numeric>
## 15S_rRNA -0.599599444711775 0.548773217327728 0.627374545210852
## 21S_rRNA -0.111550752458087 0.911179622050456 0.934130172605857
## HRA1 1.98942794599634 0.0466539875922699 0.0753013745547155
## ICR1 -1.2812022863522 0.200122623271019 0.268785314674111
## LSR1 0.477837838713748 0.632765627252953 0.70030622452096
## NME1 -0.987413980578691 0.323439733395463 0.403330270778396
sum(res_10$padj < 0.05, na.rm=T)
## [1] 3854
Check densities of the test statistic and effect sizes for significant DE genes.
data.frame(res_10[res_10$padj < 0.05 & !is.na(res_10$padj),]) %>%
ggplot(aes(x = stat)) +
geom_density() +
xlab("Test Statistic")
data.frame(res_10[res_10$padj < 0.05 & !is.na(res_10$padj),]) %>%
ggplot(aes(x = log2FoldChange)) +
geom_density() +
xlab("Effect Size (log2 Fold Change)")
There are several thousand genes detected using the full set. Next, we’ll build input data frame for summarized benchmark.
geneExp <- tbl_df(data.frame(geneName=rownames(res_10),
pval=res_10$pvalue,
SE=res_10$lfcSE,
ind_covariate = res_10$baseMean,
effect_size=res_10$log2FoldChange,
test_statistic=res_10$stat,
pzero=rowSums(counts(subset)==0)/ncol(counts(subset))))
# filter NAs and those with less than 50% expressed
geneExp <- geneExp %>% na.omit() %>% dplyr::filter(pzero < 0.5)
We’ll create a plot to examine the distribution of effect sizes, since the ash method assumes that the distribution of true (unobserved) effect sizes is unimodal.
ggplot(data=geneExp, aes(effect_size)) +
geom_histogram(bins=30)
We’ll also explore how the standard error (used by ash) correlates with the independent covariate (used by methods that incorporate covariates), in order to get an idea of how these pieces of information relate to one another.
ggplot(data=geneExp, aes(x = ind_covariate, y = SE)) +
geom_hex(bins = 100) +
scale_x_continuous(trans="log10") +
xlab("Covariate: Mean gene expression")
Look at covariate diagnostic plots.
strat_hist(geneExp, pvalue="pval", covariate="ind_covariate", maxy=30)
rank_scatter(geneExp, pvalue="pval", covariate="ind_covariate")
## Warning: Removed 9 rows containing non-finite values (stat_binhex).
Build common bench design. We also add in Scott’s FDR Regression since our test statistics are approximately t-distributed. Note that the assumption of FDRreg-empirical that the distribution of non-null test statistics does not have a significant mass at zero is violated here, but we still include fdrreg-e in the evaluation since this assumption is impossible to check outside of a simulation setting (since it is not possible to know which tests are non-null). Thus, we’d like to evaluate how results change when the assumption is violated, since it’s plausible that it is violated in the case studies.
bd <- initializeBenchDesign()
##
## Attaching package: 'IHW'
## The following object is masked from 'package:ggplot2':
##
## alpha
##
## Attaching package: 'ashr'
## The following object is masked from 'package:mclust':
##
## dens
## Loading required package: fda
## Loading required package: splines
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
## The following object is masked from 'package:tidyr':
##
## expand
##
## Attaching package: 'fda'
## The following object is masked from 'package:graphics':
##
## matplot
## Loading required package: BayesLogit
## Loading required package: mvtnorm
bd <- addBMethod(bd, "fdrreg-t",
FDRreg::FDRreg,
function(x) { x$FDR },
z = test_statistic,
features = model.matrix( ~ splines::bs(ind_covariate, df = 3) - 1),
nulltype = 'theoretical',
control = list(lambda = 0.01))
bd <- addBMethod(bd, "fdrreg-e",
FDRreg::FDRreg,
function(x) { x$FDR },
z = test_statistic,
features = model.matrix( ~ splines::bs(ind_covariate, df = 3) - 1),
nulltype = 'empirical',
control = list(lambda = 0.01))
Run benchmark methods.
sb <- bd %>% buildBench(data=geneExp, parallel = FALSE)
## Due to absence of package REBayes, switching to EM algorithm
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Model selection starts!
## Shrink the set of candidate models if it is too time-consuming.
##
|
| | 0%
|
|========== | 20%
|
|==================== | 40%
|
|============================== | 60%
|
|======================================== | 80%
|
|==================================================| 100%
assayNames(sb) <- "qvalue"
sb <- addDefaultMetrics(sb)
Plot results.
rejections_scatter(sb, supplementary=FALSE)
plotFDRMethodsOverlap(sb, alpha=0.05, nsets=ncol(sb), order.by="freq", decreasing=TRUE, supplementary=FALSE)
Here we’ll carry out an analysis of a subset of the full dataset, comparing the controls to the knockout samples. The fold changes observed in this comparison will be used when generating the non-null simulated data in the following sections. Here we use a subset of the samples (5 samples in each group) to mimic the sample sizes we use in simulation.
First we run DESeq2.
# results on full set
set.seed(728)
subset.wt <- dds_full[,colData(dds_full)$condition == "WT"]
subset.sn <- dds_full[,colData(dds_full)$condition == "Snf2"]
subset <- cbind(subset.wt[, sample(1:ncol(subset.wt), 5)],
subset.sn[, sample(1:ncol(subset.sn), 5)])
dds <- DESeq(subset)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
resultsNames(dds) # lists the coefficients
## [1] "Intercept" "condition_WT_vs_Snf2"
res_5 <- results(dds, name="condition_WT_vs_Snf2", independentFiltering = F)
head(res_5)
## log2 fold change (MLE): condition WT vs Snf2
## Wald test p-value: condition WT vs Snf2
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## 15S_rRNA 25.6626923517005 -1.34815645290897 1.40561049095492
## 21S_rRNA 114.275757036222 -0.709270244739718 1.13245600210531
## HRA1 2.6946770614275 0.114883899003022 0.866594538316973
## ICR1 151.161331077218 -0.164662939582324 0.108031359607394
## LSR1 229.529135332088 -0.370287291037731 0.612634881796695
## NME1 31.9605950080128 -0.313338368213699 0.474311940931533
## stat pvalue padj
## <numeric> <numeric> <numeric>
## 15S_rRNA -0.959125206865156 NA NA
## 21S_rRNA -0.626311524175012 0.531110628953209 0.646273413046184
## HRA1 0.132569378092481 0.894533951361094 0.930036965782474
## ICR1 -1.52421426686418 0.127455195247127 0.21073064217961
## LSR1 -0.604417577320732 0.545566052439667 0.659087267557163
## NME1 -0.660616655777867 0.508858184848436 0.626647659378039
sum(res_5$padj < 0.05, na.rm=T)
## [1] 2900
Check densities of the test statistic and effect sizes for significant DE genes.
data.frame(res_5[res_5$padj < 0.05 & !is.na(res_5$padj),]) %>%
ggplot(aes(x = stat)) +
geom_density() +
xlab("Test Statistic")
data.frame(res_5[res_5$padj < 0.05 & !is.na(res_5$padj),]) %>%
ggplot(aes(x = log2FoldChange)) +
geom_density() +
xlab("Effect Size (log2 Fold Change)")
There are several thousand genes detected using the full set. Next, we’ll build input data frame for summarized benchmark.
geneExp <- tbl_df(data.frame(geneName=rownames(res_5),
pval=res_5$pvalue,
SE=res_5$lfcSE,
ind_covariate = res_5$baseMean,
effect_size=res_5$log2FoldChange,
test_statistic=res_5$stat,
pzero=rowSums(counts(subset)==0)/ncol(counts(subset))))
# filter NAs and those with less than 50% expressed
geneExp <- geneExp %>% na.omit() %>% dplyr::filter(pzero < 0.5)
We’ll create a plot to examine the distribution of effect sizes, since the ash method assumes that the distribution of true (unobserved) effect sizes is unimodal.
ggplot(data=geneExp, aes(effect_size)) +
geom_histogram(bins=30)
We’ll also explore how the standard error (used by ash) correlates with the independent covariate (used by methods that incorporate covariates), in order to get an idea of how these pieces of information relate to one another.
ggplot(data=geneExp, aes(x = ind_covariate, y = SE)) +
geom_hex(bins = 100) +
scale_x_continuous(trans="log10") +
xlab("Covariate: Mean gene expression")
Look at covariate diagnostic plots.
strat_hist(geneExp, pvalue="pval", covariate="ind_covariate", maxy=23)
rank_scatter(geneExp, pvalue="pval", covariate="ind_covariate")
## Warning: Removed 4 rows containing non-finite values (stat_binhex).
Run benchmark methods.
sb <- bd %>% buildBench(data=geneExp, parallel = FALSE)
## Due to absence of package REBayes, switching to EM algorithm
## Model selection starts!
## Shrink the set of candidate models if it is too time-consuming.
##
|
| | 0%
|
|========== | 20%
|
|==================== | 40%
|
|============================== | 60%
|
|======================================== | 80%
|
|==================================================| 100%
assayNames(sb) <- "qvalue"
sb <- addDefaultMetrics(sb)
Plot results.
rejections_scatter(sb, supplementary=FALSE)
plotFDRMethodsOverlap(sb, alpha=0.05, nsets=ncol(sb), order.by="freq", decreasing=TRUE, supplementary=FALSE)
First we’ll illustrate the logistic curve used to simulate the informative covariate. In this study we’ll sample from a logistic curve and use these values as probability weights when selecting DE genes. A weakly informative covariate will be generated by adding noise to the strongly informative covariate.
x <- seq(0,10, by = 0.1)
infcov <- 1 / (1+exp(-x + 5))
plot(x, infcov, type = "l", ylab = "informative covariate")
plot(infcov, pmin(1, abs(infcov + rnorm(length(x), 0, 0.25))),
ylab = "weak covariate", xlab = "true covariate")
Next, we’ll analyze random splits of one condition, both with and without the addition of simulated DE genes. Here we’ll create a function that we can use to run one replicate given sample size and number of DE gene settings. This will be looped over many replications and results averaged over them.
#' @param X simnumber
#' @param seed random seed
#' @param sampleSize is the number of samples in each condition
#' @param nDE is the number of DE genes
#' @param bd is the bench design object
#' @param BPPARAM is the BiocParallel bpparam argument to pass to DESeq. Since
#' parallelization happens by forking multiple instances of this function, we
#' will specify only one worker per DEseq intance.
#' @param uninformativeCovariate logical indicating whether to replace the
#' informative covariate with a randomly generated (uninformative) covariate.
#' Default is FALSE.
#' @param pvalHists logical whether to return histograms of pvalues by covariate
#' instead of SB objects
simulateOneSplit <- function(X, rseed, nDE, sampleSize,
bd, uninformativeCovariate = FALSE, pvalHists = FALSE,
strongCovariate = TRUE){
# set random seed
set.seed(as.numeric(X)*as.numeric(rseed))
covar <- 1 / (1+exp(-runif(nrow(dds_full), 0, 10) + 5))
# select a random subset of 20 WT samples
dds_test <- dds_full[,sample(1:ncol(dds_full), sampleSize*2)]
# add a fake condition column to coldat
colData(dds_test)$fake <- factor(c(rep("A", sampleSize),
rep("B", sampleSize))[sample(1:(sampleSize*2),
sampleSize*2)])
design(dds_test) <- ~fake
pzero = rowSums(counts(dds_test)==0)/ncol(counts(dds_test))
dds_test <- dds_test[pzero < 0.5,]
covar <- covar[pzero < 0.5]
truth <- rep(FALSE, nrow(dds_test))
# make sure null comparison is truly null: if PC1 or PC2 sig different,
# or if test of join means of PCs 1:4 is significant,
# reshuffle sample labels
dds_test <- estimateSizeFactors(dds_test)
x <- t(counts(dds_test, normalize=TRUE))
pc <- prcomp(log(x + 0.5), scale.=TRUE)
a1 <- pc$x[colData(dds_test)$fake=="A",1]
b1 <- pc$x[colData(dds_test)$fake=="B",1]
a2 <- pc$x[colData(dds_test)$fake=="A",2]
b2 <- pc$x[colData(dds_test)$fake=="B",2]
p1 <- t.test(a1, b1)$p.value
p2 <- t.test(a2, b2)$p.value
tries <- 0
while(p1 < 0.10 || p2 < 0.10 && tries < 10){
colData(dds_test)$fake <- sample(colData(dds_test)$fake, ncol(dds_test))
x <- t(counts(dds_test, normalize=TRUE))
pc <- prcomp(log(x + 0.5), scale.=TRUE)
a1 <- pc$x[colData(dds_test)$fake=="A",1]
b1 <- pc$x[colData(dds_test)$fake=="B",1]
a2 <- pc$x[colData(dds_test)$fake=="A",2]
b2 <- pc$x[colData(dds_test)$fake=="B",2]
p1 <- t.test(a1, b1)$p.value
p2 <- t.test(a2, b2)$p.value
tries <- tries + 1
}
if(nDE > 0){
# pick random set of nDE genes to add signal to
DE <- sample(1:nrow(dds_test), nDE, prob = covar)
truth[DE] <- TRUE
# randomly sample a log2FC from original FCs (without regard to DE)
counts_new <- counts(dds_test)
log2FC <- rep(0, nrow(dds_test))
if(sampleSize == 5){
log2FC[DE] <- res_5$log2FoldChange[pzero < 0.5][DE]
}else if (sampleSize == 10){
log2FC[DE] <- res_10$log2FoldChange[pzero < 0.5][DE]
}else{
stop("Only sample sizes 5 and 10 are currently supported with pre-",
"computed fold changes for sampling from.")
}
# randomize which condition is shifted up or down
ran <- runif(nrow(dds_test))
refcond <- ifelse(ran < 0.5, "A", "B")
down <- which(ran < 0.5)
counts_new[down,colData(dds_test)$fake==unique(refcond[down])] <-
counts(dds_test)[down, colData(dds_test)$fake==unique(refcond[down])] *
2^log2FC[down]
counts_new[-down,colData(dds_test)$fake==unique(refcond[-down])] <-
counts(dds_test)[-down, colData(dds_test)$fake==unique(refcond[-down])] *
2^log2FC[-down]
counts_new <- apply(counts_new, 2, as.integer)
counts(dds_test) <- counts_new
}
# replace existing size factors
dds_test <- estimateSizeFactors(dds_test)
dds_test <- DESeq(dds_test, parallel = FALSE)
resTEST <- results(dds_test, name="fake_B_vs_A", independentFiltering = FALSE)
geneExp <- tbl_df(data.frame(geneName=rownames(resTEST),
pval=resTEST$pvalue,
SE=resTEST$lfcSE,
ind_covariate = covar,
effect_size = resTEST$log2FoldChange,
test_statistic = resTEST$stat,
qvalue = truth))
if (uninformativeCovariate){
geneExp <- mutate(geneExp, ind_covariate = runif(length(covar)))
}else if(!strongCovariate){
geneExp <- mutate(geneExp, ind_covariate = pmin(1, abs(covar + rnorm(length(covar), 0, 0.25))))
}
geneExp <- geneExp %>% dplyr::filter(!is.na(pval))
if(pvalHists){
return(strat_hist(geneExp, pvalue="pval", covariate="ind_covariate", maxy =10))
}
sb <- bd %>% buildBench(data=geneExp, parallel = FALSE,
truthCols = "qvalue",
ftCols = "ind_covariate")
assayNames(sb) <- "qvalue"
sb <- addDefaultMetrics(sb)
rowData(sb)$log2FC <- geneExp$effect_size
return(sb)
}
We’ll also set some parameters that will be common to all simulations. These include the number of replications, the bench design object, the set of methods to exclude in the results plots, and the alpha cutoff level to be used when plotting the aggregated Upset results.
B <- 100
excludeSet <- c("unadjusted", "bl-df02", "bl-df04", "bl-df05")
ualpha <- 0.05
# only keep one condition for subsetting in simulations that follow
dds_full <- dds_full[,colData(dds_full)$condition == "Snf2"]
Here’s a helper function to return the number of methods with rejections at a particular alpha level (this helps us determine whether or not to plot the aggregated upset plot - if there aren’t at least 2 methods it will throw an error, which is a problem for the null simulations).
# To be included in the upset agg plot, method needs to have found on average
# at least one rejection per replicate. To create an upset plot, require that
# at least two methods rejected at this threshold.
#' @param res standardized metric data.table generated using
#' standardize_results.
#' @param alpha alpha cutoff
#' @param filterSet which methods to exclude from consideration
numberMethodsReject <- function(res, alphacutoff, filterSet){
res <- res %>%
filter(is.na(param.alpha) | (param.alpha == alphacutoff)) %>%
filter(!(blabel %in% filterSet)) %>%
filter(alpha == alphacutoff) %>%
filter(performanceMetric == "rejections") %>%
select(blabel, performanceMetric, value) %>%
group_by(blabel) %>%
summarize(mean_value = mean(value)) %>%
filter(mean_value > 1)
return(nrow(res))
}
Here we’ll repeat the above, but for a null comparison 5 versus 5 Snf2 samples, where the groups are selected randomly. This will be done for 100 random splits.
rseed <- 225
sampleSize <- 5
nDE <- 0
if (!file.exists(resfile_n5)){
null5 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd, mc.cores=nCores)
saveRDS(null5, file = resfile_n5)
}else{
null5 <- readRDS(file = resfile_n5)
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(null5, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res5 <- plotsim_standardize(null5, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res5, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5, met="rejectprop",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5, met="FWER",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5, met="TNR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
covariateLinePlot(null5, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(null5, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res5, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(null5, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we’ll repeat the above, but for a null comparison 10 versus 10 Snf2 samples, where the groups are selected randomly. This will be done for 100 random splits. Note that here the covariate-aware methods use a covariate that by definition is uninformative (since there are no non-null observations).
rseed <- 837
sampleSize <- 10
nDE <- 0
if (!file.exists(resfile_n10)){
null10 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd, mc.cores=nCores)
saveRDS(null10, file = resfile_n10)
}else{
null10 <- readRDS(file = resfile_n10)
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(null10, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res10 <- plotsim_standardize(null10, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res10, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10, met="rejectprop",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10, met="FWER",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10, met="TNR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
covariateLinePlot(null10, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(null10, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res10, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(null10, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
Here we’ll repeat the above, but for a DE comparison 5 versus 5 Snf2 samples, where the groups are selected randomly and 2000 DE genes are added. We’ll use a strongly informative covariate. This will be done for 100 random splits.
rseed <- 198
sampleSize <- 5
nDE <- 2000
if (!file.exists(resfile_d5)){
de5 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd, mc.cores=nCores)
saveRDS(de5, file = resfile_d5)
}else{
de5 <- readRDS(file = resfile_d5)
}
# pvalue histograms
plotfile <- file.path(datdir, "de5_pvalhists.pdf")
if (!file.exists(plotfile)){
hists <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd,
pvalHists = TRUE, mc.cores = nCores)
pdf(plotfile, width=8, height=4)
for(i in 1:length(hists)){
print(hists[[i]])
}
dev.off()
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(de5, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res5d <- plotsim_standardize(de5, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res5d, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met="FDR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met="TPR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met=c("FPR", "TPR"), filter_set = excludeSet,
merge_ihw = TRUE)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
covariateLinePlot(de5, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(de5, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res5d, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(de5, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we’ll repeat the above, but for a DE comparison 10 versus 10 Snf2 samples, where the groups are selected randomly and 2000 DE genes are added. We’ll use a strongly informative covariate. This will be done for 100 random splits.
rseed <- 961
sampleSize <- 10
nDE <- 2000
if(!file.exists(resfile_d10)){
de10 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd, mc.cores=nCores)
saveRDS(de10, file = resfile_d10)
}else{
de10 <- readRDS(file = resfile_d10)
}
# pvalue histograms
plotfile <- file.path(datdir, "de10_pvalhists.pdf")
if (!file.exists(plotfile)){
hists <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd,
pvalHists = TRUE, mc.cores = nCores)
pdf(plotfile, width=8, height=4)
for(i in 1:length(hists)){
print(hists[[i]])
}
dev.off()
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(de10, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res10d <- plotsim_standardize(de10, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res10d, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met="FDR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met="TPR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met=c("FPR", "TPR"), filter_set = excludeSet,
merge_ihw = TRUE)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
covariateLinePlot(de10, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(de10, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res10d, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(de10, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we’ll repeat the previous section, but using a weaker informative covariate (that has noise added to it).
rseed <- 198
sampleSize <- 5
nDE <- 2000
if (!file.exists(resfile_d5_w)){
de5 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd, mc.cores=nCores,
strongCovariate = FALSE)
saveRDS(de5, file = resfile_d5_w)
}else{
de5 <- readRDS(file = resfile_d5_w)
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(de5, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res5d <- plotsim_standardize(de5, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res5d, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met="FDR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met="TPR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met=c("FPR", "TPR"), filter_set = excludeSet,
merge_ihw = TRUE)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
covariateLinePlot(de5, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(de5, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res5d, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(de5, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we’ll repeat the above, but using a weaker informative covariate (that has noise added to it).
rseed <- 961
sampleSize <- 10
nDE <- 2000
if(!file.exists(resfile_d10_w)){
de10 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd, mc.cores=nCores,
strongCovariate = FALSE)
saveRDS(de10, file = resfile_d10_w)
}else{
de10 <- readRDS(file = resfile_d10_w)
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(de10, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res10d <- plotsim_standardize(de10, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res10d, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met="FDR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met="TPR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met=c("FPR", "TPR"), filter_set = excludeSet,
merge_ihw = TRUE)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
covariateLinePlot(de10, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(de10, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res10d, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(de10, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we repeat the previous section, using an uninformative covariate.
Here we’ll repeat the above, but for a DE comparison 5 versus 5 Snf2 samples, where the groups are selected randomly and 2000 DE genes are added. This will be done for 100 random splits.
rseed <- 198
sampleSize <- 5
nDE <- 2000
if (!file.exists(resfile_d5_uninfCov)){
de5 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd,
uninformativeCovariate = TRUE, mc.cores=nCores)
saveRDS(de5, file = resfile_d5_uninfCov)
}else{
de5 <- readRDS(file = resfile_d5_uninfCov)
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(de5, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res5d <- plotsim_standardize(de5, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res5d, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met="FDR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met="TPR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res5d, met=c("FPR", "TPR"), filter_set = excludeSet,
merge_ihw = TRUE)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
covariateLinePlot(de5, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(de5, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res5d, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(de5, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we’ll repeat the above, but for a DE comparison 10 versus 10 Snf2 samples, where the groups are selected randomly and 2000 DE genes are added. This will be done for 100 random splits.
rseed <- 961
sampleSize <- 10
nDE <- 2000
if(!file.exists(resfile_d10_uninfCov)){
de10 <- mclapply(X=1:B, FUN=simulateOneSplit, rseed=rseed,
nDE=nDE, sampleSize=sampleSize, bd=bd,
uninformativeCovariate = TRUE, mc.cores=nCores)
saveRDS(de10, file = resfile_d10_uninfCov)
}else{
de10 <- readRDS(file = resfile_d10_uninfCov)
}
Plot results.
# Check for missing results (if any methods threw an error for relevant metrics).
rowSums(sapply(de10, function(x) colSums(is.na(assays(x)$qvalue)) > 0))
## unadjusted bonf bh qvalue ihw-a01 ihw-a02
## 0 0 0 0 0 0
## ihw-a03 ihw-a04 ihw-a05 ihw-a06 ihw-a07 ihw-a08
## 0 0 0 0 0 0
## ihw-a09 ihw-a10 ashq bl-df02 bl-df03 bl-df04
## 0 0 0 0 0 0
## bl-df05 lfdr adapt-glm fdrreg-t fdrreg-e
## 0 0 0 0 0
res10d <- plotsim_standardize(de10, alpha = seq(0.01, 0.10, 0.01))
plotsim_average(res10d, met="rejections",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met="FDR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met="TPR",filter_set = excludeSet,
merge_ihw = TRUE, errorBars=TRUE)
plotsim_average(res10d, met=c("FPR", "TPR"), filter_set = excludeSet,
merge_ihw = TRUE)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
covariateLinePlot(de10, alpha=0.05, covname="log2FC", nbins=25,
trans="log1p")
covariateLinePlot(de10, alpha=0.05, covname="ind_covariate", nbins=25,
trans="log1p")
if (numberMethodsReject(res10d, alphacutoff=ualpha, filterSet=excludeSet) >= 2){
aggupset(de10, alpha=ualpha, supplementary = FALSE, return_list = FALSE)
}else{
message("Not enough methods found rejections at alpha ", ualpha,
"; skipping upset plot")
}
Here we compare the method ranks for the different sample sizes and informativeness settings at alpha = 0.10.
plotMethodRanks(c(resfile_d5, resfile_d5_w, resfile_d10, resfile_d10_w,
resfile_d5_uninfCov, resfile_d10_uninfCov),
colLabels = c("5 S", "5 W", "10 S", "10 W",
"5 UI", "10 UI"),
alpha = 0.10, xlab = "Comparison",
excludeMethods = NULL)
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS: /usr/lib64/libblas.so.3.4.2
## LAPACK: /usr/lib64/liblapack.so.3.4.2
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] splines parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] adaptMT_1.0.0 FDRreg_0.2-1
## [3] mvtnorm_1.0-8 BayesLogit_0.6
## [5] fda_2.4.7 Matrix_1.2-14
## [7] fdrtool_1.2.15 swfdr_1.6.0
## [9] qvalue_2.12.0 ashr_2.2-7
## [11] IHW_1.8.0 hexbin_1.27.2
## [13] bindrcpp_0.2.2 R.utils_2.6.0
## [15] R.oo_1.22.0 R.methodsS3_1.7.1
## [17] ggthemes_3.5.0 tibble_1.4.2
## [19] DESeq2_1.20.0 purrr_0.2.5
## [21] cowplot_0.9.2 magrittr_1.5
## [23] readr_1.1.1 data.table_1.11.4
## [25] SummarizedBenchmark_0.99.1 dplyr_0.7.5
## [27] mclust_5.4 ggplot2_3.0.0
## [29] stringr_1.3.1 rlang_0.2.1
## [31] UpSetR_1.3.3 SummarizedExperiment_1.10.1
## [33] DelayedArray_0.6.1 BiocParallel_1.14.2
## [35] matrixStats_0.53.1 Biobase_2.40.0
## [37] GenomicRanges_1.32.3 GenomeInfoDb_1.16.0
## [39] IRanges_2.14.10 S4Vectors_0.18.3
## [41] BiocGenerics_0.26.0 tidyr_0.8.1
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.3-2 rprojroot_1.3-2 htmlTable_1.12
## [4] XVector_0.20.0 ggdendro_0.1-20 base64enc_0.1-3
## [7] rstudioapi_0.7 bit64_0.9-7 AnnotationDbi_1.42.1
## [10] mosaic_1.2.0 codetools_0.2-15 mnormt_1.5-5
## [13] pscl_1.5.2 doParallel_1.0.11 geneplotter_1.58.0
## [16] knitr_1.20 Formula_1.2-3 broom_0.4.4
## [19] annotate_1.58.0 cluster_2.0.7-1 compiler_3.5.0
## [22] backports_1.1.2 assertthat_0.2.0 lazyeval_0.2.1
## [25] acepack_1.4.1 htmltools_0.3.6 tools_3.5.0
## [28] gtable_0.2.0 glue_1.2.0 GenomeInfoDbData_1.1.0
## [31] reshape2_1.4.3 Rcpp_0.12.17 slam_0.1-43
## [34] nlme_3.1-137 iterators_1.0.9 psych_1.8.4
## [37] mosaicCore_0.5.0 XML_3.98-1.11 zlibbioc_1.26.0
## [40] MASS_7.3-50 scales_0.5.0 hms_0.4.2
## [43] RColorBrewer_1.1-2 yaml_2.1.19 mosaicData_0.16.0
## [46] memoise_1.1.0 gridExtra_2.3 rpart_4.1-13
## [49] latticeExtra_0.6-28 stringi_1.2.2 RSQLite_2.1.1
## [52] SQUAREM_2017.10-1 genefilter_1.62.0 foreach_1.4.4
## [55] checkmate_1.8.5 truncnorm_1.0-8 pkgconfig_2.0.1
## [58] bitops_1.0-6 evaluate_0.10.1 lattice_0.20-35
## [61] lpsymphony_1.8.0 bindr_0.1.1 htmlwidgets_1.2
## [64] labeling_0.3 bit_1.1-14 tidyselect_0.2.4
## [67] ggformula_0.7.0 plyr_1.8.4 R6_2.2.2
## [70] Hmisc_4.1-1 DBI_1.0.0 pillar_1.2.3
## [73] foreign_0.8-70 withr_2.1.2 survival_2.41-3
## [76] RCurl_1.95-4.10 nnet_7.3-12 rmarkdown_1.10
## [79] locfit_1.5-9.1 grid_3.5.0 blob_1.1.1
## [82] digest_0.6.15 xtable_1.8-2 munsell_0.4.3