Identify Proportion of Peptides Present (PPP) and Rank Invariant Peptides (RIP) for Use in Normalization

Selects biomolecules for normalization via the method of proportion of biomolecules present and rank invariant biomolecules (ppp_rip)

ppp_rip(e_data, edata_id, fdata_id, groupDF, alpha = 0.2, proportion = 0.5)

Arguments

e_data: a $p \times n + 1$ data.frame, where $p$ is the number of peptides, proteins, lipids, or metabolites and $n$ is the number of samples. Each row corresponds to data for a peptide, protein, lipid, or metabolite, with one column giving the biomolecule identifier name.
edata_id: character string indicating the name of the peptide, protein, lipid, or metabolite identifier. Usually obtained by calling attr(omicsData, "cnames")$edata_cname.
fdata_id: character string indicating the name of the sample column name in f_data.
groupDF: data.frame created by group_designation with columns for sample.id and group. If two main effects are provided the original main effect levels for each sample are returned as the third and fourth columns of the data.frame.
alpha: numeric p-value threshold, above which the biomolecules are retained as rank invariant (default value 0.25)
proportion: numeric value between 0 and 1, indicating the percentage at or above which a biomolecule must be present across all samples in order to be retained (default value 0.5)

Value

Character vector containing the biomolecules belonging to the ppp_rip subset.

Details

Biomolecules present across proportion samples are subjected to a Kruskal-Wallis test (non-parametric one-way ANOVA, where NAs are ignored) on group membership, and those biomolecules with p-value greater than a defined threshold alpha (common values include 0.1 or 0.25) are retained as rank-invariant biomolecules.

Author

Kelly Stratton