Package 'rOCEAN' reference manual

Title:	Two-Way Feature Set Testing for Multi-Omics
Description:	For any two way feature-set from a pair of pre-processed omics data, pairwise-TDP, Column-TDP and row-TDP are calculated. Due to embeded closed testing procedure, the choice of feature-sets can be changed infinite times and even after seeing the data withuot any change in type I error rate. For more details refer to the refrence article.
Authors:	Mitra Ebrahimpoor [aut, cre]
Maintainer:	Mitra Ebrahimpoor <[email protected]>
License:	GPL (>= 2)
Version:	1.0
Built:	2025-03-13 05:01:22 UTC
Source:	https://github.com/mitra-ep/rocean

Calculate pairwise p-value

Description

Calculates pairwise matrix of p-values based on Pearson's correlation test for two matrices. To gain speed and manage RAM usage, the matrices are split into several smaller chunks.

Usage

corPs(pm1, pm2, type = c("Mat", "Vec"), pthresh = 0.05)
corPs(pm1, pm2, type = c("Mat", "Vec"), pthresh = 0.05)

Arguments

`pm1`, `pm2`	Subsets of two omics data sets where rows are the features and columns are samples. The rows of the two matrices would define the two-way feature set of interest.
`type`	Two options are available. Mat: Calculate the correlation of subsets and return a matrix; Vec: calculate the correlation matrix, subset by the given threshold and return a vector of p-values.
`pthresh`	Only relevant for type="Vec". The threshold by which the p-values are filtered (p>pthresh is removed). Default value is 0.05.

Value

Either a matrix or vector of pairwise p-values, as indicated by type parameter.

Calculates heuristic and lower bound for the true discovery proportion (TDP) in 3 scales for a specified two-way feature set (Algorithm 1 in the reference). The input is either two omics data sub-matrices or the pre-calculated matrix of p-values for pairwise associations. In case the result is not exact, the function adopts branch and bound (Algorithm 2 in the reference), if nMax allows.

Usage

ocean(
  pm1,
  pm2,
  gCT,
  scale = c("pair", "row", "col"),
  mps,
  nMax = 100,
  verbose = TRUE
)
ocean(
  pm1,
  pm2,
  gCT,
  scale = c("pair", "row", "col"),
  mps,
  nMax = 100,
  verbose = TRUE
)

Arguments

`pm1`, `pm2`	Matrix; Subsets of two omics data sets where rows are the features and columns are samples. The rows of the two matrices would define the two-way feature set of interest.
`gCT`	Vector; Parameters of the global closed testing, output of simesCT function.
`scale`	Optional character vector; It specifies the scale of TDP quantification. Possible choices are "pair" (pair-TDP), "col" (col-TDP ) and "row" (for row-TDP'). If not specified, all three scales are returned.
`mps`	Optional matrix of p-values; A sub-matrix of pairwise associations, representing the two-way feature set of interest. If provided, `pm1` and `pm2` are not required. If not provided, matrix of pairwise associations will be derived from `pm1` and `pm2` based on Pearson's correlation.
`nMax`	Maximum number of steps for branch and bound algorithm, if set to 1 branch and bound is skipped even if the result is not exact. The default value is a 100. The algorithm may stop before the `nMax` is reached if it converges sooner.
`verbose`	Logical; if `TRUE`, progress messages will be displayed during the function's execution. Default is `TRUE`.

Value

TDP is returned for the specified scales, along with number of steps taken and convergence status for branch and bound algorithm.

Examples


#number of feature per omic data set
n_cols<-1000
n_rows<-1200

#random matrix of p-values
set.seed(1258)
pvalmat<-matrix(runif(n_rows*n_cols, min=0, max=1)^3, nrow=n_rows, ncol=n_cols)

#calculate CT parameters
gCT<-simesCT(mps=pvalmat, m=nrow(pvalmat)*ncol(pvalmat))

#calculate TDPs for a random feature set
subpmat<-pvalmat[1:400,100:750]
#Note: it can take loner to run this script if nMax is large
out<-ocean(mps=subpmat, gCT=gCT, nMax=2)
out

#number of feature per omic data set
n_cols<-1000
n_rows<-1200

#random matrix of p-values
set.seed(1258)
pvalmat<-matrix(runif(n_rows*n_cols, min=0, max=1)^3, nrow=n_rows, ncol=n_cols)

#calculate CT parameters
gCT<-simesCT(mps=pvalmat, m=nrow(pvalmat)*ncol(pvalmat))

#calculate TDPs for a random feature set
subpmat<-pvalmat[1:400,100:750]
#Note: it can take loner to run this script if nMax is large
out<-ocean(mps=subpmat, gCT=gCT, nMax=2)
out

pairwise true discoveries proportion

Description

Calculates the TDP over pairs; based on SEA algorithm

Usage

pairTDP(mps, n, gCT)
pairTDP(mps, n, gCT)

Arguments

`mps`	Matrix or vector of pairwise associations.
`n`	Number of pairs; may not be the size of p if a threshold is used to remove large p-values.
`gCT`	Parameters of the global closed testing, output of simesCT function.

Value

Proportion of true discoveries out of n pairs of features.

Branch and bound algorithm implementation

Description

Performs B&B when the bound are not exact

Usage

runbab(sCat, ssh, ssb, nMax = 100)
runbab(sCat, ssh, ssb, nMax = 100)

Arguments

`sCat`	Category matrix, output of getCat function
`ssh`	current Heuristic as provided by SingleStep function
`ssb`	current Bound as provided by SingleStep function
`nMax`	Maximum number of steps for the algorithm, the algorithm may stop sooner if it converges.

Value

A list, including the heuristic and the bound for the number of true discoveries, along with number of steps taken and convergence status.

Closed testing with Simes

Description

Calculates five parameters from closed testing with Simes local tests based on raw data. These parameter are unique per data/alpha-level combination and do not depend on feature sets. Calculation may be somewhat long depending on the size of data sets and PC configurations.

Usage

simesCT(om1, om2, mps, m, alpha = 0.05)
simesCT(om1, om2, mps, m, alpha = 0.05)

Arguments

`om1`, `om2`	Two omics data sets where rows are features and columns are samples.
`mps`, `m`	Optional, pre-calculated matrix/vector of pairwise associations and the size. To save time in calculation of parameters, a threshold such as the type I error may be applies to remove larger p-values. If a threshold is used, size of matrix and `m` will not match. `m` should always be the size of the matrix of associations (number of features in `om1` X number of features in `om2`).
`alpha`	type I error rate, default value is 0.05.

Value

Vector of integers: grand H value, concentration p-value, size of concentration set z, size of the original pair-wise associations matrix and the type I error level used in calculations.

References

See more details in "Hommel's procedure in linear time" doi: 10.1002/bimj.201700316.

Single step algorithm

Description

Calculates heuristic and upper-bound for the number of true discoveries based on the Algorithm 1 introduced in paper.

Usage

singleStep(sCat, B)
singleStep(sCat, B)

Arguments

`sCat`	p-categories matrix, output of getCat function.
`B`	Optional, to identify rows to be fixed (1) or removed (0) while splitting the search space.

Value

A list of two objects, the heuristic and the lower bound for true number of discoveries

`mps`	Matrix of p-values, representing pairwise associations between two feature sets.
`gCT`	Parameters of the global closed testing, which is the output of simesCT function.
`scale`	Scale of the quantification, a character string. Possible choices are "col" and "row".

Package 'rOCEAN'

Help Index

Calculate pairwise p-value

Description

Usage

Arguments

Value

Calculate p-categories

Description

Usage

Arguments

Value

See Also

OCEAN algorithm

Description

Usage

Arguments

Value

See Also

Examples

pairwise true discoveries proportion

Description

Usage

Arguments

Value

See Also

Branch and bound algorithm implementation

Description

Usage

Arguments

Value

See Also

Closed testing with Simes

Description

Usage

Arguments

Value

References

Single step algorithm

Description

Usage

Arguments

Value

See Also