Identify Biomolecules from the Top L Order Statistics for Use in Normalization

Select biomolecules for normalization via the method of the top L order statistics (LOS)

los(e_data, edata_id, L = 0.05)

Arguments

e_data: a $p \times n + 1$ data.frame, where $p$ is the number of peptides, proteins, lipids, or metabolites and $n$ is the number of samples. Each row corresponds to data for a peptide, protein, lipid, or metabolite, with one column giving the biomolecule identifier name.
edata_id: character string indicating the name of the column giving the peptide, protein, lipid, or metabolite identifier. Usually obtained by calling attr(omicsData, "cnames")$edata_cname.
L: numeric value between 0 and 1, indicating the top proportion of biomolecules to be retained (default value 0.05)

Value

Character vector containing the biomolecules belonging to the subset.

Details

The biomolecule abundances of the top L order statistics are identified and returned. Specifically, for each sample, the biomolecules with the top L proportion of highest absolute abundance are retained, and the union of these biomolecules is taken as the subset identified.

Author

Kelly Stratton, Lisa Bramer