1 Summary

In this set of simulations, we consider settings with both null and non-null tests with varying levels of informativeness of an informative covariate. The covariate is sampled uniformly from the interval [0, 1], and the conditional probability of a test being non-null is a step function of the covariate with two levels (over the intervals [0, 4/5], (4/5, 1]) where the size of the step is used to indicate the informativeness of the covariate.

For all settings, the marginal proportion of null hypotheses is set to 80%. A tuning parameter is used to modify the informativeness of the covariate such that 0 corresponds to an uninformative covariate, and 1 corresponds to a fully informative covariate (i.e. all tests with covariate values in [0, 4/5] are null and all tests in (4/5, 1] are non-null).

2 Workspace Setup

library(dplyr)
library(ggplot2)
library(SummarizedBenchmark)
library(parallel)

## load helper functions
for (f in list.files("../R", "\\.(r|R)$", full.names = TRUE)) {
    source(f)
}

## project data/results folders
resdir <- "results"
dir.create(resdir, showWarnings = FALSE, recursive = TRUE)

## intermediary files we create below
info00_file <- file.path(resdir, "varyinginfo-benchmark-level00.rds")
info20_file <- file.path(resdir, "varyinginfo-benchmark-level20.rds")
info40_file <- file.path(resdir, "varyinginfo-benchmark-level40.rds")
info60_file <- file.path(resdir, "varyinginfo-benchmark-level60.rds")
info80_file <- file.path(resdir, "varyinginfo-benchmark-level80.rds")
info100_file <- file.path(resdir, "varyinginfo-benchmark-level100.rds")

## number of cores for parallelization
cores <- 20
B <- 100

## define bechmarking design
bd <- initializeBenchDesign()

As described in simulations-null.Rmd, we include Scott’s FDR Regression in the analysis for simulations with Gaussian noise. Again, we include both nulltype = "empirical" and nulltype = "theoretical". Since all settings in this series of simulations use test statistics simulated with Gaussian noise, we include Scott’s FDR Regression in all of the comparisons.

bdplus <- bd
bdplus <- addBMethod(bdplus, "fdrreg-t",
                     FDRreg::FDRreg,
                     function(x) { x$FDR },
                     z = test_statistic,
                     features = model.matrix( ~  splines::bs(ind_covariate, df = 3) - 1),
                     nulltype = 'theoretical',
                     control = list(lambda = 0.01))
bdplus <- addBMethod(bdplus, "fdrreg-e",
                     FDRreg::FDRreg,
                     function(x) { x$FDR },
                     z = test_statistic,
                     features = model.matrix( ~  splines::bs(ind_covariate, df = 3) - 1),
                     nulltype = 'empirical',
                     control = list(lambda = 0.01))

All simulation settings will share the following parameters.

m <- 20000                        # integer: number of hypothesis tests
es_dist <- rnorm_generator(2)       # functional: dist of alternative test stats
ts_dist <- rnorm_perturber(1)  # functional: sampling dist/noise for test stats
null_dist <- rnorm_2pvaluer(1)    # functional: dist to calc p-values
icovariate <- runif               # functional: independent covariate

Simulation results will be presented excluding a subset of methods, and for certain plots (upset plots), a single alpha cutoff will be used.

excludeSet <- c("unadjusted", "bl-df02", "bl-df04", "bl-df05")
ualpha <- 0.05

First, we show the form of the relationship between the informative covariate and null proportion, pi0, across varying levels of “informativeness.”

xseq <- seq(0, 1, by = .01)
pi0trends <- lapply(seq(0, 1, by = .2), function(i) {
    tibble(info = i, x = xseq,
           pi0 = pi0_varyinfo80(i)(xseq))
})
pi0trends <- bind_rows(pi0trends)

ggplot(pi0trends, aes(x = x, y = pi0)) +
    geom_line() +
    geom_text(aes(label = round(pi0, 3), hjust = x, vjust = round(pi0)),
              dplyr::filter(pi0trends, x == 0 | x == 1),
              size = 3, color = 'blue') + 
    theme_bw() +
    scale_x_continuous("x (informative covariate)", breaks = 0:1) +
    scale_y_continuous(labels = scales::percent) +
    facet_grid(. ~ info, labeller = label_both) +
    ggtitle("Covariate vs. pi0 for varying informativeness")

Alternatively, since the function is a step function, we can simply plot the levels of the two steps.

xseq <- seq(0, 1, by = .01)
pi0trends <- lapply(xseq, function(i) {
    tibble(info = i, x = 0:1,
           pi0 = pi0_varyinfo80(i)(0:1))
})
pi0trends <- bind_rows(pi0trends)
pi0trends <- dplyr::mutate(pi0trends,
                           x = factor(x, levels = 0:1,
                                      labels = c("[0, .8]", "(.8, 1]")))

ggplot(pi0trends, aes(x = info, y = pi0, group = x)) +
    geom_line() +
    geom_text(aes(label = x, vjust = round(pi0)),
              dplyr::filter(pi0trends, info == 1),
              hjust = 1, size = 3, color = 'blue') + 
    theme_bw() +
    scale_x_continuous("Informativeness", labels = scales::percent) +
    scale_y_continuous(labels = scales::percent) +
    ggtitle("null proportions for varying informativeness")

3 Informativeness = 0 Setting

First, we consider the setting where the informativeness is 0.

3.1 Data Simulation

pi0 <- pi0_varyinfo80(0.00)             # numeric: proportion of null hypotheses
seed <- 1010

We next run the simulations.

if (file.exists(info00_file)) {
    res <- readRDS(info00_file)
} else {
    res <- mclapply(X = 1:B, FUN = simIteration, bench = bdplus, m = m,
                    pi0 = pi0, es_dist = es_dist, icovariate = icovariate,
                    ts_dist = ts_dist, null_dist = null_dist,
                    seed = seed, mc.cores = cores)
    saveRDS(res, file = info00_file)
}
res_i <- lapply(res, `[[`, "informative")
res_u <- lapply(res, `[[`, "uninformative")

3.2 Covariate Diagnostics

Here, we show the relationship between the independent covariate and p-values for a single replication of the experiment.

onerun <- simIteration(bdplus, m = m, pi0 = pi0, es_dist = es_dist, ts_dist = ts_dist,
                       icovariate = icovariate, null_dist = null_dist, execute = FALSE)

rank_scatter(onerun, pvalue = "pval", covariate = "ind_covariate")

strat_hist(onerun, pvalue = "pval", covariate = "ind_covariate", maxy = 10, numQ = 3)

## 
## Attaching package: 'cowplot'

## The following object is masked from 'package:ggplot2':
## 
##     ggsave

3.3 Benchmark Metrics

We plot the averaged results across 100 replications.

resdf <- plotsim_standardize(res_i, alpha = seq(0.01, 0.10, 0.01))

plotsim_average(resdf, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

We also take a look at the distribution of rejects for each method as a function of the effect size and independent covariate.

covariateLinePlot(res_i, alpha = ualpha, covname = "effect_size")

covariateLinePlot(res_i, alpha = ualpha, covname = "ind_covariate")

Finally, (if enough methods produce rejections at 0.05) we take a look at the overlap of rejections between methods.

if (numberMethodsReject(resdf, alphacutoff = ualpha, filterSet = excludeSet) >= 3) {
    aggupset(res_i, alpha = ualpha, supplementary = FALSE, return_list = FALSE)
} else {
    message("Not enough methods found rejections at alpha ", ualpha, 
            "; skipping upset plot")
}

We also compare the simulation results with and without an informative covariate.

resdfu <- plotsim_standardize(res_u, alpha = seq(0.01, 0.10, 0.01))

resdfiu <- dplyr::full_join(select(resdf, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            select(resdfu, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            by = c("rep", "blabel", "param.alpha", "key",
                                   "performanceMetric", "alpha"),
                            suffix = c(".info", ".uninfo"))
resdfiu <- dplyr::mutate(resdfiu, value = value.info - value.uninfo)

plotsim_average(resdfiu, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

4 Informativeness = 20 Setting

Next, we consider the setting where the informativeness is 0.20.

4.1 Data Simulation

pi0 <- pi0_varyinfo80(0.20)             # numeric: proportion of null hypotheses
seed <- 1201

We next run the simulations.

if (file.exists(info20_file)) {
    res <- readRDS(info20_file)
} else {
    res <- mclapply(X = 1:B, FUN = simIteration, bench = bdplus, m = m,
                    pi0 = pi0, es_dist = es_dist, icovariate = icovariate,
                    ts_dist = ts_dist, null_dist = null_dist,
                    seed = seed, mc.cores = cores)
    saveRDS(res, file = info20_file)
}
res_i <- lapply(res, `[[`, "informative")
res_u <- lapply(res, `[[`, "uninformative")

4.2 Covariate Diagnostics

Here, we show the relationship between the independent covariate and p-values for a single replication of the experiment.

onerun <- simIteration(bdplus, m = m, pi0 = pi0, es_dist = es_dist, ts_dist = ts_dist,
                       icovariate = icovariate, null_dist = null_dist, execute = FALSE)

rank_scatter(onerun, pvalue = "pval", covariate = "ind_covariate")

strat_hist(onerun, pvalue = "pval", covariate = "ind_covariate", maxy = 10, numQ = 3)

4.3 Benchmark Metrics

We plot the averaged results across 100 replications.

resdf <- plotsim_standardize(res_i, alpha = seq(0.01, 0.10, 0.01))

plotsim_average(resdf, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

We also take a look at the distribution of rejects for each method as a function of the effect size and independent covariate.

covariateLinePlot(res_i, alpha = ualpha, covname = "effect_size")

covariateLinePlot(res_i, alpha = ualpha, covname = "ind_covariate")

Finally, (if enough methods produce rejections at 0.05) we take a look at the overlap of rejections between methods.

if (numberMethodsReject(resdf, alphacutoff = ualpha, filterSet = excludeSet) >= 3) {
    aggupset(res_i, alpha = ualpha, supplementary = FALSE, return_list = FALSE)
} else {
    message("Not enough methods found rejections at alpha ", ualpha, 
            "; skipping upset plot")
}

We also compare the simulation results with and without an informative covariate.

resdfu <- plotsim_standardize(res_u, alpha = seq(0.01, 0.10, 0.01))

resdfiu <- dplyr::full_join(select(resdf, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            select(resdfu, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            by = c("rep", "blabel", "param.alpha", "key",
                                   "performanceMetric", "alpha"),
                            suffix = c(".info", ".uninfo"))
resdfiu <- dplyr::mutate(resdfiu, value = value.info - value.uninfo)

plotsim_average(resdfiu, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

5 Informativeness = 40 Setting

Next, we consider the setting where the informativeness is 0.40.

5.1 Data Simulation

pi0 <- pi0_varyinfo80(0.40)             # numeric: proportion of null hypotheses
seed <- 51

We next run the simulations.

if (file.exists(info40_file)) {
    res <- readRDS(info40_file)
} else {
    res <- mclapply(X = 1:B, FUN = simIteration, bench = bdplus, m = m,
                    pi0 = pi0, es_dist = es_dist, icovariate = icovariate,
                    ts_dist = ts_dist, null_dist = null_dist,
                    seed = seed, mc.cores = cores)
    saveRDS(res, file = info40_file)
}
res_i <- lapply(res, `[[`, "informative")
res_u <- lapply(res, `[[`, "uninformative")

5.2 Covariate Diagnostics

Here, we show the relationship between the independent covariate and p-values for a single replication of the experiment.

onerun <- simIteration(bdplus, m = m, pi0 = pi0, es_dist = es_dist, ts_dist = ts_dist,
                       icovariate = icovariate, null_dist = null_dist, execute = FALSE)

rank_scatter(onerun, pvalue = "pval", covariate = "ind_covariate")

strat_hist(onerun, pvalue = "pval", covariate = "ind_covariate", maxy = 10, numQ = 3)

5.3 Benchmark Metrics

We plot the averaged results across 100 replications.

resdf <- plotsim_standardize(res_i, alpha = seq(0.01, 0.10, 0.01))

plotsim_average(resdf, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

We also take a look at the distribution of rejects for each method as a function of the effect size and independent covariate.

covariateLinePlot(res_i, alpha = ualpha, covname = "effect_size")

covariateLinePlot(res_i, alpha = ualpha, covname = "ind_covariate")

Finally, (if enough methods produce rejections at 0.05) we take a look at the overlap of rejections between methods.

if (numberMethodsReject(resdf, alphacutoff = ualpha, filterSet = excludeSet) >= 3) {
    aggupset(res_i, alpha = ualpha, supplementary = FALSE, return_list = FALSE)
} else {
    message("Not enough methods found rejections at alpha ", ualpha, 
            "; skipping upset plot")
}

We also compare the simulation results with and without an informative covariate.

resdfu <- plotsim_standardize(res_u, alpha = seq(0.01, 0.10, 0.01))

resdfiu <- dplyr::full_join(select(resdf, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            select(resdfu, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            by = c("rep", "blabel", "param.alpha", "key",
                                   "performanceMetric", "alpha"),
                            suffix = c(".info", ".uninfo"))
resdfiu <- dplyr::mutate(resdfiu, value = value.info - value.uninfo)

plotsim_average(resdfiu, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

6 Informativeness = 60 Setting

Next, we consider the setting where the informativeness is 0.60.

6.1 Data Simulation

pi0 <- pi0_varyinfo80(0.60)             # numeric: proportion of null hypotheses
seed <- 500

We next run the simulations.

if (file.exists(info60_file)) {
    res <- readRDS(info60_file)
} else {
    res <- mclapply(X = 1:B, FUN = simIteration, bench = bdplus, m = m,
                    pi0 = pi0, es_dist = es_dist, icovariate = icovariate,
                    ts_dist = ts_dist, null_dist = null_dist,
                    seed = seed, mc.cores = cores)
    saveRDS(res, file = info60_file)
}
res_i <- lapply(res, `[[`, "informative")
res_u <- lapply(res, `[[`, "uninformative")

6.2 Covariate Diagnostics

Here, we show the relationship between the independent covariate and p-values for a single replication of the experiment.

onerun <- simIteration(bdplus, m = m, pi0 = pi0, es_dist = es_dist, ts_dist = ts_dist,
                       icovariate = icovariate, null_dist = null_dist, execute = FALSE)

rank_scatter(onerun, pvalue = "pval", covariate = "ind_covariate")

strat_hist(onerun, pvalue = "pval", covariate = "ind_covariate", maxy = 10, numQ = 3)

6.3 Benchmark Metrics

We plot the averaged results across 100 replications.

resdf <- plotsim_standardize(res_i, alpha = seq(0.01, 0.10, 0.01))

plotsim_average(resdf, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

We also take a look at the distribution of rejects for each method as a function of the effect size and independent covariate.

covariateLinePlot(res_i, alpha = ualpha, covname = "effect_size")

covariateLinePlot(res_i, alpha = ualpha, covname = "ind_covariate")

Finally, (if enough methods produce rejections at 0.05) we take a look at the overlap of rejections between methods.

if (numberMethodsReject(resdf, alphacutoff = ualpha, filterSet = excludeSet) >= 3) {
    aggupset(res_i, alpha = ualpha, supplementary = FALSE, return_list = FALSE)
} else {
    message("Not enough methods found rejections at alpha ", ualpha, 
            "; skipping upset plot")
}

We also compare the simulation results with and without an informative covariate.

resdfu <- plotsim_standardize(res_u, alpha = seq(0.01, 0.10, 0.01))

resdfiu <- dplyr::full_join(select(resdf, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            select(resdfu, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            by = c("rep", "blabel", "param.alpha", "key",
                                   "performanceMetric", "alpha"),
                            suffix = c(".info", ".uninfo"))
resdfiu <- dplyr::mutate(resdfiu, value = value.info - value.uninfo)

plotsim_average(resdfiu, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

7 Informativeness = 80 Setting

Next, we consider the setting where the informativeness is 0.80.

7.1 Data Simulation

pi0 <- pi0_varyinfo80(0.80)             # numeric: proportion of null hypotheses
seed <- 608

We next run the simulations.

if (file.exists(info80_file)) {
    res <- readRDS(info80_file)
} else {
    res <- mclapply(X = 1:B, FUN = simIteration, bench = bdplus, m = m,
                    pi0 = pi0, es_dist = es_dist, icovariate = icovariate,
                    ts_dist = ts_dist, null_dist = null_dist,
                    seed = seed, mc.cores = cores)
    saveRDS(res, file = info80_file)
}
res_i <- lapply(res, `[[`, "informative")
res_u <- lapply(res, `[[`, "uninformative")

7.2 Covariate Diagnostics

Here, we show the relationship between the independent covariate and p-values for a single replication of the experiment.

onerun <- simIteration(bdplus, m = m, pi0 = pi0, es_dist = es_dist, ts_dist = ts_dist,
                       icovariate = icovariate, null_dist = null_dist, execute = FALSE)

rank_scatter(onerun, pvalue = "pval", covariate = "ind_covariate")

strat_hist(onerun, pvalue = "pval", covariate = "ind_covariate", maxy = 10, numQ = 3)

7.3 Benchmark Metrics

We plot the averaged results across 100 replications.

resdf <- plotsim_standardize(res_i, alpha = seq(0.01, 0.10, 0.01))

plotsim_average(resdf, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

We also take a look at the distribution of rejects for each method as a function of the effect size and independent covariate.

covariateLinePlot(res_i, alpha = ualpha, covname = "effect_size")

covariateLinePlot(res_i, alpha = ualpha, covname = "ind_covariate")

Finally, (if enough methods produce rejections at 0.05) we take a look at the overlap of rejections between methods.

if (numberMethodsReject(resdf, alphacutoff = ualpha, filterSet = excludeSet) >= 3) {
    aggupset(res_i, alpha = ualpha, supplementary = FALSE, return_list = FALSE)
} else {
    message("Not enough methods found rejections at alpha ", ualpha, 
            "; skipping upset plot")
}

We also compare the simulation results with and without an informative covariate.

resdfu <- plotsim_standardize(res_u, alpha = seq(0.01, 0.10, 0.01))

resdfiu <- dplyr::full_join(select(resdf, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            select(resdfu, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            by = c("rep", "blabel", "param.alpha", "key",
                                   "performanceMetric", "alpha"),
                            suffix = c(".info", ".uninfo"))
resdfiu <- dplyr::mutate(resdfiu, value = value.info - value.uninfo)

plotsim_average(resdfiu, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

8 Informativeness = 100 Setting

Finally, we consider the setting where the informativeness is 1.00.

8.1 Data Simulation

pi0 <- pi0_varyinfo80(1.00)             # numeric: proportion of null hypotheses
seed <- 808

We next run the simulations.

if (file.exists(info100_file)) {
    res <- readRDS(info100_file)
} else {
    res <- mclapply(X = 1:B, FUN = simIteration, bench = bdplus, m = m,
                    pi0 = pi0, es_dist = es_dist, icovariate = icovariate,
                    ts_dist = ts_dist, null_dist = null_dist,
                    seed = seed, mc.cores = cores)
    saveRDS(res, file = info100_file)
}
res_i <- lapply(res, `[[`, "informative")
res_u <- lapply(res, `[[`, "uninformative")

8.2 Covariate Diagnostics

Here, we show the relationship between the independent covariate and p-values for a single replication of the experiment.

onerun <- simIteration(bdplus, m = m, pi0 = pi0, es_dist = es_dist, ts_dist = ts_dist,
                       icovariate = icovariate, null_dist = null_dist, execute = FALSE)

rank_scatter(onerun, pvalue = "pval", covariate = "ind_covariate")

strat_hist(onerun, pvalue = "pval", covariate = "ind_covariate", maxy = 10, numQ = 3)

8.3 Benchmark Metrics

We plot the averaged results across 100 replications.

resdf <- plotsim_standardize(res_i, alpha = seq(0.01, 0.10, 0.01))

plotsim_average(resdf, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

plotsim_average(resdf, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE)

We also take a look at the distribution of rejects for each method as a function of the effect size and independent covariate.

covariateLinePlot(res_i, alpha = ualpha, covname = "effect_size")

covariateLinePlot(res_i, alpha = ualpha, covname = "ind_covariate")

Finally, (if enough methods produce rejections at 0.05) we take a look at the overlap of rejections between methods.

if (numberMethodsReject(resdf, alphacutoff = ualpha, filterSet = excludeSet) >= 3) {
    aggupset(res_i, alpha = ualpha, supplementary = FALSE, return_list = FALSE)
} else {
    message("Not enough methods found rejections at alpha ", ualpha, 
            "; skipping upset plot")
}

We also compare the simulation results with and without an informative covariate.

resdfu <- plotsim_standardize(res_u, alpha = seq(0.01, 0.10, 0.01))

resdfiu <- dplyr::full_join(select(resdf, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            select(resdfu, rep, blabel, param.alpha, key,
                                   performanceMetric, alpha, value),
                            by = c("rep", "blabel", "param.alpha", "key",
                                   "performanceMetric", "alpha"),
                            suffix = c(".info", ".uninfo"))
resdfiu <- dplyr::mutate(resdfiu, value = value.info - value.uninfo)

plotsim_average(resdfiu, met="rejections", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="FDR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TPR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

plotsim_average(resdfiu, met="TNR", filter_set = excludeSet,
                merge_ihw = TRUE, errorBars = TRUE, diffplot = TRUE)

9 Session Info

sessionInfo()

## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS: /usr/lib64/libblas.so.3.4.2
## LAPACK: /usr/lib64/liblapack.so.3.4.2
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] splines   parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_0.9.2              hexbin_1.27.2             
##  [3] bindrcpp_0.2.2             adaptMT_1.0.0             
##  [5] FDRreg_0.2-1               mvtnorm_1.0-8             
##  [7] BayesLogit_0.6             fda_2.4.7                 
##  [9] Matrix_1.2-14              fdrtool_1.2.15            
## [11] swfdr_1.4.0                qvalue_2.10.0             
## [13] ashr_2.2-7                 IHW_1.6.0                 
## [15] magrittr_1.5               SummarizedBenchmark_0.99.2
## [17] mclust_5.4.1               BiocParallel_1.12.0       
## [19] stringr_1.3.1              rlang_0.2.1               
## [21] UpSetR_1.3.3               SummarizedExperiment_1.8.1
## [23] DelayedArray_0.4.1         matrixStats_0.53.1        
## [25] Biobase_2.38.0             GenomicRanges_1.30.3      
## [27] GenomeInfoDb_1.14.0        IRanges_2.12.0            
## [29] S4Vectors_0.16.0           BiocGenerics_0.24.0       
## [31] tidyr_0.8.1                ggplot2_3.0.0             
## [33] dplyr_0.7.6               
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-137           bitops_1.0-6           doParallel_1.0.11     
##  [4] rprojroot_1.3-2        tools_3.5.0            backports_1.1.2       
##  [7] R6_2.2.2               lazyeval_0.2.1         colorspace_1.3-2      
## [10] withr_2.1.2            tidyselect_0.2.4       gridExtra_2.3         
## [13] mnormt_1.5-5           compiler_3.5.0         ggdendro_0.1-20       
## [16] labeling_0.3           slam_0.1-43            mosaicCore_0.5.0      
## [19] scales_0.5.0           SQUAREM_2017.10-1      psych_1.8.4           
## [22] digest_0.6.15          ggformula_0.7.0        foreign_0.8-70        
## [25] rmarkdown_1.9          XVector_0.18.0         pscl_1.5.2            
## [28] pkgconfig_2.0.1        htmltools_0.3.6        lpsymphony_1.7.1      
## [31] bindr_0.1.1            RCurl_1.95-4.10        GenomeInfoDbData_1.0.0
## [34] mosaicData_0.16.0      Rcpp_0.12.18           munsell_0.4.3         
## [37] stringi_1.2.2          yaml_2.1.19            MASS_7.3-50           
## [40] zlibbioc_1.24.0        plyr_1.8.4             grid_3.5.0            
## [43] lattice_0.20-35        knitr_1.20             pillar_1.2.3          
## [46] reshape2_1.4.3         codetools_0.2-15       glue_1.3.0            
## [49] evaluate_0.10.1        foreach_1.4.4          gtable_0.2.0          
## [52] purrr_0.2.5            assertthat_0.2.0       broom_0.4.4           
## [55] truncnorm_1.0-8        tibble_1.4.2           iterators_1.0.9       
## [58] mosaic_1.2.0

Simulation Study: Varying Informativeness

Patrick Kimes

September 01, 2018

1 Summary

2 Workspace Setup

3 Informativeness = 0 Setting

3.1 Data Simulation

3.2 Covariate Diagnostics

3.3 Benchmark Metrics

4 Informativeness = 20 Setting

4.1 Data Simulation

4.2 Covariate Diagnostics

4.3 Benchmark Metrics

5 Informativeness = 40 Setting

5.1 Data Simulation

5.2 Covariate Diagnostics

5.3 Benchmark Metrics

6 Informativeness = 60 Setting

6.1 Data Simulation

6.2 Covariate Diagnostics

6.3 Benchmark Metrics

7 Informativeness = 80 Setting

7.1 Data Simulation

7.2 Covariate Diagnostics

7.3 Benchmark Metrics

8 Informativeness = 100 Setting

8.1 Data Simulation

8.2 Covariate Diagnostics

8.3 Benchmark Metrics

9 Session Info