Gene expression modeling pipeline
modelGeneExpression.Rd
modelGeneExpression
uses parallelization if parallel backend is
registered. For that reason we advise against passing parallel
argument
to internally called cv.glmnet
routine.
Usage
modelGeneExpression(
mae,
yname = "Y",
uname = "U",
xnames,
design = NULL,
standardize = TRUE,
parallel = FALSE,
pvalues = TRUE,
precalcmodels = NULL,
...
)
Arguments
- mae
MultiAssayExperiment object such as produced by
prepareCountsForRegression
.- yname
string indicating experiment in
mae
to use as the expression input.- uname
string indicating experiment in
mae
to use as the basal expression level.- xnames
character indicating experiments in
mae
to use as molecular signatures.- design
matrix giving the design matrix for the samples. Default (
NULL
) is to use design found inmae
metadata. Columns corresponds to samples groups and rows to samples names. Only samples included in the design will be processed.- standardize
logical flag indicating if the molecular signatures should be scaled. Advised to be set to
TRUE
.- parallel
parallel argument to internally used
cv.glmnet
function. Advised to be set toFALSE
as it might interfere with parallelization used inmodelGeneExpression
.- pvalues
logical flag indicating if significance testing for the estimated molecular signatures activities should be performed.
- precalcmodels
optional list of precomputed
'cv.glmnet'
objects for each molecular signature and sample. The elements of this list should be matching thexnames
vector. Each of those elements should be a named list holding'cv.glmnet'
objects for each sample. If provided those models will be used instead of running regression from scratch.- ...
arguments passed to glmnet::cv.glmnet.
Value
Nested list with following elements
- regression_models
Named list with elements corresponding to signatures specified in
xnames
. Each of these is a list holding'cv.glmnet'
objects corresponding to each sample.- pvalues
Named list with elements corresponding to signatures specified in
xnames
. Each of these is a list holdingdata.frame
of signature's p-values and test statistics estimated for each sample.- zscore_avg
Named list with elements corresponding to signatures specified in
xnames
. Each of these is amatrix
holding replicate average Z-scores with columns corresponding to groups in the design.- coef_avg
Named list with elements corresponding to signatures specified in
xnames
. Each of these is amatrix
holding replicate averaged signatures activities with columns corresponding to groups in the design.- results
Named list of a
data.frame
s holding replicate average molecular signatures, overall molecular signatures Z-score and p-values calculated over groups using Stouffer's and Fisher's methods.
Details
For speeding up the calculations consider lowering number of folds used in
internally run cv.glmnet
by specifying nfolds
argument. By default 10 fold cross validation is used.
The relationship between the expression (Y) and molecular signatures (X) is described using linear model formulation. The pipeline attempts to model the change in expression between basal expression level (u) and each sample, with the goal of finding the unknown molecular signatures activities. Linear models are fit using popular ridge regression implementation glmnet (Friedman, Hastie, and Tibshirani 2010).
If pvalues
is set to TRUE
the significance of the estimated
molecular signatures activities is tested using methodology introduced by
(Cule, Vineis, and De Iorio 2011) which original implementation can be found
in ridge-package.
If replicates are available the signatures activities estimates and their standard error estimates can be combined. This is done by averaging signatures activities estimates and pooling their significance estimates using Stouffer's method for the Z-scores and Fisher's method for the p-values.
For detailed pipeline description we refer interested user to paper accompanying this package.
Examples
data("rinderpest_mini", "remap_mini")
base_lvl <- "00hr"
design <- matrix(
data = c(1, 0, 0,
1, 0, 0,
1, 0, 0,
0, 1, 0,
0, 1, 0,
0, 1, 0,
0, 0, 1,
0, 0, 1,
0, 0, 1),
ncol = 3,
nrow = 9,
byrow = TRUE,
dimnames = list(colnames(rinderpest_mini), c("00hr", "12hr", "24hr")))
mae <- prepareCountsForRegression(
counts = rinderpest_mini,
design = design,
base_lvl = base_lvl)
mae <- addSignatures(mae, remap = remap_mini)
mae <- filterSignatures(mae)
res <- modelGeneExpression(
mae = mae,
xnames = "remap",
nfolds = 5)
#> ##------ modelGeneExpression: started ridge regression Wed Aug 2 11:20:01 2023 ------##
#> Warning: executing %dopar% sequentially: no parallel backend registered
#> Loading required package: Matrix
#> Loaded glmnet 4.1-7
#> ##------ modelGeneExpression: finished ridge regressionWed Aug 2 11:21:49 2023 ------##
#> ##------ modelGeneExpression: started significance testing Wed Aug 2 11:21:49 2023 ------##
#> ##------ modelGeneExpression: finished significance testing Wed Aug 2 11:21:56 2023 ------##