Title: | Model-Based Gene Selection for Paired Data |
---|---|
Description: | Model-based clustering for paired data based on the regression of a mixture of Bayesian hierarchical models on covariates. Zhang et al. (2023) <doi:10.1186/s12859-023-05556-x>. |
Authors: | Yixin Zhang [aut, cre], Wei Liu [aut, ctb], Weiliang Qiu [aut, ctb] |
Maintainer: | Yixin Zhang <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3.2 |
Built: | 2024-11-08 03:56:56 UTC |
Source: | https://github.com/cran/eLNNpairedCov |
Model-based clustering based on extended log-normal normal model for paired data adjusting for covariates.
eLNNpairedCov( EsetDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = TRUE, Maxiter =10, maxIT = 10, b=c(2,2,2), converge_threshold = 1e-3, optimMethod = "L-BFGS-B", bound.alpha = c(0.001, 6), bound.beta = c(0.001, 6), bound.k = c(0.001, 0.9999), bound.eta = c(-10, 10), mc.cores = 1, verbose=FALSE)
eLNNpairedCov( EsetDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = TRUE, Maxiter =10, maxIT = 10, b=c(2,2,2), converge_threshold = 1e-3, optimMethod = "L-BFGS-B", bound.alpha = c(0.001, 6), bound.beta = c(0.001, 6), bound.k = c(0.001, 0.9999), bound.eta = c(-10, 10), mc.cores = 1, verbose=FALSE)
EsetDiff |
An ExpressionSet object storing the log2 difference between post-treatment and pre-treatment. |
fmla |
A formula without outcome variable. |
probeID.var |
character. Indicates the probe id. |
gene.var |
character. Indicates the gene symbol. |
chr.var |
character. Indicates the chromosome. |
scaleFlag |
logical. Indicating if rows (probes) need to be scaled (but not centered). |
Maxiter |
integer. The max allowed number of iterations for EM algorithm. Default value is maxRT = 100. |
maxIT |
integer. The max allowed number of iterations in R built-in function optim. Default value is maxIT = 100.
|
b |
numeric. A vector of concentration parameters used in Dirichlet distribution. Default value is b = c(2,2,2). |
converge_threshold |
numeric.
One of the two termination criteria of iteration.
The smaller this value is set, the harder the optimization procedure in eLNNpaired will be considered to be converged. Default value is converge_threshold |
optimMethod |
character. Indicates the method for optimization. |
bound.alpha |
numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of |
bound.beta |
numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of |
bound.k |
numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of |
bound.eta |
numeric. A vector of |
mc.cores |
integer. A positive integer specifying number of computer cores to be used by parallel computing. |
verbose |
logic. An indicator variable telling if print out intermediate results: FALSE for not printing out, TRUE for printing out. Default value is verbose = False. |
A gene will be assigned to cluster “NE” if its posterior probability for non-differentially expressed gene cluster is the largest. A gene will be assigned to cluster “OE” if its posterior probability for over-expressed gene cluster is the largest. A gene will be assigned to cluster “UE” if its responsibility for under-expressed gene cluster is the largest.
A list of 9 elementes:
par.ini |
initial estimate of parameter |
par.final |
A vector of the estimated model parameters in original scale. |
memGenes |
probe cluster membership based on eLNNpairedCov algorithm. |
memGenes2 |
probe cluster membership based on eLNNpairedCov algorithm. 2-categories: "DE" indicates differentially expressed; "NE" indicates non-differentially expressed. |
memGenes.limma |
probe cluster membership based on limma. |
res.ini |
results of limma analysis |
update_info |
object returned by |
wmat |
matrix of responsibilities |
iter |
number of EM iterations. |
Yixin Zhang [email protected], Wei Liu [email protected], Weiliang Qiu [email protected]
Zhang Y, Liu W, Qiu W. A model-based clustering via mixture of hierarchical models with covariate adjustment for detecting differentially expressed genes from paired design. BMC Bioinformatics 24, 423 (2023)
data(esDiff) res = eLNNpairedCov(EsetDiff = esDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = FALSE, mc.cores = 1, verbose = TRUE) # true probe cluster membership memGenes.true = fData(esDiff)$memGenes.true print(table(memGenes.true)) # probe cluster membership memGenes.limma = res$memGenes.limma print(table(memGenes.limma)) # final probe cluster membership memGenes = res$memGenes print(table(memGenes)) # cross tables print(table(memGenes.true, memGenes.limma)) print(table(memGenes.true, memGenes)) # accuracies print(mean(memGenes.true == memGenes.limma)) print(mean(memGenes.true == memGenes))
data(esDiff) res = eLNNpairedCov(EsetDiff = esDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = FALSE, mc.cores = 1, verbose = TRUE) # true probe cluster membership memGenes.true = fData(esDiff)$memGenes.true print(table(memGenes.true)) # probe cluster membership memGenes.limma = res$memGenes.limma print(table(memGenes.limma)) # final probe cluster membership memGenes = res$memGenes print(table(memGenes)) # cross tables print(table(memGenes.true, memGenes.limma)) print(table(memGenes.true, memGenes)) # accuracies print(mean(memGenes.true == memGenes.limma)) print(mean(memGenes.true == memGenes))
Model-based clustering based on extended log-normal normal model for paired data adjusting for covariates.
eLNNpairedCovSEM( EsetDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = TRUE, Maxiter =10, maxIT = 10, b=c(2,2,2), converge_threshold = 1e-3, optimMethod = "L-BFGS-B", bound.alpha = c(0.001, 6), bound.beta = c(0.001, 6), bound.k = c(0.001, 0.9999), bound.eta = c(-10, 10), mc.cores = 1, temp0 = 2, r_cool=0.9, verbose=FALSE)
eLNNpairedCovSEM( EsetDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = TRUE, Maxiter =10, maxIT = 10, b=c(2,2,2), converge_threshold = 1e-3, optimMethod = "L-BFGS-B", bound.alpha = c(0.001, 6), bound.beta = c(0.001, 6), bound.k = c(0.001, 0.9999), bound.eta = c(-10, 10), mc.cores = 1, temp0 = 2, r_cool=0.9, verbose=FALSE)
EsetDiff |
An ExpressionSet object storing the log2 difference between post-treatment and pre-treatment. |
fmla |
A formula without outcome variable. |
probeID.var |
character. Indicates the probe id. |
gene.var |
character. Indicates the gene symbol. |
chr.var |
character. Indicates the chromosome. |
scaleFlag |
logical. Indicating if rows (probes) need to be scaled (but not centered). |
Maxiter |
integer. The max allowed number of iterations for EM algorithm. Default value is maxRT = 100. |
maxIT |
integer. The max allowed number of iterations in R built-in function optim. Default value is maxIT = 100.
|
b |
numeric. A vector of concentration parameters used in Dirichlet distribution. Default value is b = c(2,2,2). |
converge_threshold |
numeric.
One of the two termination criteria of iteration.
The smaller this value is set, the harder the optimization procedure in eLNNpaired will be considered to be converged. Default value is converge_threshold |
optimMethod |
character. Indicates the method for optimization. |
bound.alpha |
numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of |
bound.beta |
numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of |
bound.k |
numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of |
bound.eta |
numeric. A vector of |
mc.cores |
integer. A positive integer specifying number of computer cores to be used by parallel computing. |
temp0 |
numeric. Initial temperature in simulated-annealing modified EM. |
r_cool |
numeric. Cooling rate in simulated-annealing modified EM, which is inside interval |
verbose |
logic. An indicator variable telling if print out intermediate results: FALSE for not printing out, TRUE for printing out. Default value is verbose = False. |
A gene will be assigned to cluster “NE” if its posterior probability for non-differentially expressed gene cluster is the largest. A gene will be assigned to cluster “OE” if its posterior probability for over-expressed gene cluster is the largest. A gene will be assigned to cluster “UE” if its responsibility for under-expressed gene cluster is the largest.
A list of 9 elementes:
par.ini |
initial estimate of parameter |
par.final |
A vector of the estimated model parameters in original scale. |
memGenes |
probe cluster membership based on eLNNpairedCovSEM algorithm. |
memGenes2 |
probe cluster membership based on eLNNpairedCovSEM algorithm. 2-categories: "DE" indicates differentially expressed; "NE" indicates non-differentially expressed. |
memGenes.limma |
probe cluster membership based on limma. |
res.ini |
results of limma analysis |
update_info |
object returned by |
wmat |
matrix of responsibilities |
iter.EM |
number of EM iterations. |
tempFinal |
final temperature in simulated-annealing modification EM |
Yixin Zhang [email protected], Wei Liu [email protected], Weiliang Qiu [email protected]
Zhang Y, Liu W, Qiu W. A model-based clustering via mixture of hierarchical models with covariate adjustment for detecting differentially expressed genes from paired design. BMC Bioinformatics 24, 423 (2023)
data(esDiff) res.SEM = eLNNpairedCovSEM(EsetDiff = esDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = FALSE, mc.cores = 1, verbose = TRUE) # true probe cluster membership memGenes.true = fData(esDiff)$memGenes.true print(table(memGenes.true)) # probe cluster membership memGenes.limma = res.SEM$memGenes.limma print(table(memGenes.limma)) # final probe cluster membership memGenes.SEM = res.SEM$memGenes print(table(memGenes.SEM)) # cross tables print(table(memGenes.true, memGenes.limma)) print(table(memGenes.true, memGenes.SEM)) # accuracies print(mean(memGenes.true == memGenes.limma)) print(mean(memGenes.true == memGenes.SEM))
data(esDiff) res.SEM = eLNNpairedCovSEM(EsetDiff = esDiff, fmla = ~Age + Sex, probeID.var = "probeid", gene.var = "gene", chr.var = "chr", scaleFlag = FALSE, mc.cores = 1, verbose = TRUE) # true probe cluster membership memGenes.true = fData(esDiff)$memGenes.true print(table(memGenes.true)) # probe cluster membership memGenes.limma = res.SEM$memGenes.limma print(table(memGenes.limma)) # final probe cluster membership memGenes.SEM = res.SEM$memGenes print(table(memGenes.SEM)) # cross tables print(table(memGenes.true, memGenes.limma)) print(table(memGenes.true, memGenes.SEM)) # accuracies print(mean(memGenes.true == memGenes.limma)) print(mean(memGenes.true == memGenes.SEM))
An ExpressionSet object storing a simulated data of log2 difference of expression levels with 1000 probes, 20 subjects, and 2 covariates.
data("esDiff")
data("esDiff")
This dataset was generated from the mixture of 3-component
Bayesian hierarchical models. For true parameters, please
refer to the manual for the R function genSimDat
.
data(esDiff) print(esDiff)
data(esDiff) print(esDiff)
Generate a simulated dataset from a mixture of Bayesian hierarchical models with two covariates: age and sex.
genSimDat(G, n, psi, t_pi, m.age = 50, sd.age = 5, p.female = 0.5)
genSimDat(G, n, psi, t_pi, m.age = 50, sd.age = 5, p.female = 0.5)
G |
integer. Number of probes. |
n |
integer. Number of samples. |
psi |
numeric. A vector of model hyper-parameters with elements
|
t_pi |
numeric. A vector of mixture proportions: |
m.age |
numeric. mean age. |
sd.age |
numeric. standard deviation of age. |
p.female |
numeric. proportion of females. |
An ExpressionSet object.
Age will be mean-centered and scaled so that it will have mean zero and variance one.
Yixin Zhang [email protected], Wei Liu [email protected], Weiliang Qiu [email protected]
Zhang Y, Liu W, Qiu W. A model-based clustering via mixture of hierarchical models with covariate adjustment for detecting differentially expressed genes from paired design. BMC Bioinformatics 24, 423 (2023)
set.seed(1234567) true.psi = c(2, 1, 0.8, 0.1, -0.01, -0.1, 2, 1, 0.8, -0.1, -0.01, -0.1, 2, 1, 0.8, -0.01, -0.1) names(true.psi)=c("alpha1", "beta1", "k1", "eta1.intercept", "eta1.Age", "eta1.Sex", "alpha2", "beta2", "k2", "eta2.intercept", "eta2.Age", "eta2.Sex", "alpha3", "beta3", "k3", "eta3.Age", "eta3.Sex") true.pi=c(0.1, 0.1) names(true.pi)=c("pi.OE", "pi.UE") par.true=c(true.pi, true.psi) esDiff = genSimDat(G = 1000, n = 20, psi = true.psi, t_pi = true.pi, m.age = 0, # scaled age sd.age = 1, # scaled age p.female = 0.5) print(esDiff)
set.seed(1234567) true.psi = c(2, 1, 0.8, 0.1, -0.01, -0.1, 2, 1, 0.8, -0.1, -0.01, -0.1, 2, 1, 0.8, -0.01, -0.1) names(true.psi)=c("alpha1", "beta1", "k1", "eta1.intercept", "eta1.Age", "eta1.Sex", "alpha2", "beta2", "k2", "eta2.intercept", "eta2.Age", "eta2.Sex", "alpha3", "beta3", "k3", "eta3.Age", "eta3.Sex") true.pi=c(0.1, 0.1) names(true.pi)=c("pi.OE", "pi.UE") par.true=c(true.pi, true.psi) esDiff = genSimDat(G = 1000, n = 20, psi = true.psi, t_pi = true.pi, m.age = 0, # scaled age sd.age = 1, # scaled age p.female = 0.5) print(esDiff)