Selects biomolecules for normalization via the method of proportion of biomolecules present and rank invariant biomolecules (ppp_rip)

ppp_rip(e_data, edata_id, fdata_id, groupDF, alpha = 0.2, proportion = 0.5)

Arguments

e_data

a \(p \times n + 1\) data.frame, where \(p\) is the number of peptides, proteins, lipids, or metabolites and \(n\) is the number of samples. Each row corresponds to data for a peptide, protein, lipid, or metabolite, with one column giving the biomolecule identifier name.

edata_id

character string indicating the name of the peptide, protein, lipid, or metabolite identifier. Usually obtained by calling attr(omicsData, "cnames")$edata_cname.

fdata_id

character string indicating the name of the sample column name in f_data.

groupDF

data.frame created by group_designation with columns for sample.id and group. If two main effects are provided the original main effect levels for each sample are returned as the third and fourth columns of the data.frame.

alpha

numeric p-value threshold, above which the biomolecules are retained as rank invariant (default value 0.25)

proportion

numeric value between 0 and 1, indicating the percentage at or above which a biomolecule must be present across all samples in order to be retained (default value 0.5)

Value

Character vector containing the biomolecules belonging to the ppp_rip subset.

Details

Biomolecules present across proportion samples are subjected to a Kruskal-Wallis test (non-parametric one-way ANOVA, where NAs are ignored) on group membership, and those biomolecules with p-value greater than a defined threshold alpha (common values include 0.1 or 0.25) are retained as rank-invariant biomolecules.

Author

Kelly Stratton