Title: | Two-Way Feature Set Testing for Multi-Omics |
---|---|
Description: | For any two way feature-set from a pair of pre-processed omics data, pairwise-TDP, Column-TDP and row-TDP are calculated. Due to embeded closed testing procedure, the choice of feature-sets can be changed infinite times and even after seeing the data withuot any change in type I error rate. For more details refer to the refrence article. |
Authors: | Mitra Ebrahimpoor [aut, cre] |
Maintainer: | Mitra Ebrahimpoor <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2025-01-12 05:20:51 UTC |
Source: | https://github.com/mitra-ep/rocean |
Calculates pairwise matrix of p-values based on Pearson's correlation test for two matrices. To gain speed and manage RAM usage, the matrices are split into several smaller chunks.
corPs(pm1, pm2, type = c("Mat", "Vec"), pthresh = 0.05)
corPs(pm1, pm2, type = c("Mat", "Vec"), pthresh = 0.05)
pm1 , pm2
|
Subsets of two omics data sets where rows are the features and columns are samples. The rows of the two matrices would define the two-way feature set of interest. |
type |
Two options are available. Mat: Calculate the correlation of subsets and return a matrix; Vec: calculate the correlation matrix, subset by the given threshold and return a vector of p-values. |
pthresh |
Only relevant for type="Vec". The threshold by which the p-values are filtered (p>pthresh is removed). Default value is 0.05. |
Either a matrix or vector of pairwise p-values, as indicated by type
parameter.
Calculates cumulative p-categories for a given matrix of p-values.
getCat(mps, gCT, scale = c("col", "row"))
getCat(mps, gCT, scale = c("col", "row"))
mps |
Matrix of p-values, representing pairwise associations between two feature sets. |
gCT |
Parameters of the global closed testing, which is the output of simesCT function. |
scale |
Scale of the quantification, a character string. Possible choices are "col" and "row". |
Matrix of p-categories.
Calculates heuristic and lower bound for the true discovery proportion (TDP) in 3 scales for
a specified two-way feature set (Algorithm 1 in the reference).
The input is either two omics data sub-matrices or the pre-calculated matrix of p-values for pairwise associations.
In case the result is not exact, the function adopts branch and bound (Algorithm 2 in the reference), if nMax
allows.
ocean( pm1, pm2, gCT, scale = c("pair", "row", "col"), mps, nMax = 100, verbose = TRUE )
ocean( pm1, pm2, gCT, scale = c("pair", "row", "col"), mps, nMax = 100, verbose = TRUE )
pm1 , pm2
|
Matrix; Subsets of two omics data sets where rows are the features and columns are samples. The rows of the two matrices would define the two-way feature set of interest. |
gCT |
Vector; Parameters of the global closed testing, output of simesCT function. |
scale |
Optional character vector; It specifies the scale of TDP quantification. Possible choices are "pair" (pair-TDP), "col" (col-TDP ) and "row" (for row-TDP'). If not specified, all three scales are returned. |
mps |
Optional matrix of p-values; A sub-matrix of pairwise associations, representing the two-way feature set of interest.
If provided, |
nMax |
Maximum number of steps for branch and bound algorithm, if set to 1 branch and bound
is skipped even if the result is not exact. The default value is a 100. The algorithm may
stop before the |
verbose |
Logical; if |
TDP is returned for the specified scales, along with number of steps taken and convergence status for branch and bound algorithm.
#number of feature per omic data set n_cols<-1000 n_rows<-1200 #random matrix of p-values set.seed(1258) pvalmat<-matrix(runif(n_rows*n_cols, min=0, max=1)^3, nrow=n_rows, ncol=n_cols) #calculate CT parameters gCT<-simesCT(mps=pvalmat, m=nrow(pvalmat)*ncol(pvalmat)) #calculate TDPs for a random feature set subpmat<-pvalmat[1:400,100:750] #Note: it can take loner to run this script if nMax is large out<-ocean(mps=subpmat, gCT=gCT, nMax=2) out
#number of feature per omic data set n_cols<-1000 n_rows<-1200 #random matrix of p-values set.seed(1258) pvalmat<-matrix(runif(n_rows*n_cols, min=0, max=1)^3, nrow=n_rows, ncol=n_cols) #calculate CT parameters gCT<-simesCT(mps=pvalmat, m=nrow(pvalmat)*ncol(pvalmat)) #calculate TDPs for a random feature set subpmat<-pvalmat[1:400,100:750] #Note: it can take loner to run this script if nMax is large out<-ocean(mps=subpmat, gCT=gCT, nMax=2) out
Calculates the TDP over pairs; based on SEA algorithm
pairTDP(mps, n, gCT)
pairTDP(mps, n, gCT)
mps |
Matrix or vector of pairwise associations. |
n |
Number of pairs; may not be the size of p if a threshold is used to remove large p-values. |
gCT |
Parameters of the global closed testing, output of simesCT function. |
Proportion of true discoveries out of n pairs of features.
Performs B&B when the bound are not exact
runbab(sCat, ssh, ssb, nMax = 100)
runbab(sCat, ssh, ssb, nMax = 100)
sCat |
Category matrix, output of getCat function |
ssh |
current Heuristic as provided by SingleStep function |
ssb |
current Bound as provided by SingleStep function |
nMax |
Maximum number of steps for the algorithm, the algorithm may stop sooner if it converges. |
A list, including the heuristic and the bound for the number of true discoveries, along with number of steps taken and convergence status.
Calculates five parameters from closed testing with Simes local tests based on raw data. These parameter are unique per data/alpha-level combination and do not depend on feature sets. Calculation may be somewhat long depending on the size of data sets and PC configurations.
simesCT(om1, om2, mps, m, alpha = 0.05)
simesCT(om1, om2, mps, m, alpha = 0.05)
om1 , om2
|
Two omics data sets where rows are features and columns are samples. |
mps , m
|
Optional, pre-calculated matrix/vector of pairwise associations and the size.
To save time in calculation of parameters, a threshold such as the type I error may be applies to remove larger p-values.
If a threshold is used, size of matrix and |
alpha |
type I error rate, default value is 0.05. |
Vector of integers: grand H value, concentration p-value, size of concentration set z, size of the original pair-wise associations matrix and the type I error level used in calculations.
See more details in "Hommel's procedure in linear time" doi: 10.1002/bimj.201700316.
Calculates heuristic and upper-bound for the number of true discoveries based on the Algorithm 1 introduced in paper.
singleStep(sCat, B)
singleStep(sCat, B)
sCat |
p-categories matrix, output of getCat function. |
B |
Optional, to identify rows to be fixed (1) or removed (0) while splitting the search space. |
A list of two objects, the heuristic and the lower bound for true number of discoveries