malbacR: A Vignette Pertaining to Pre-Filtered Data
Damon Leach, Kelly Stratton, Lisa Bramer
2024-02-06
malbacR_vignette_condensed_filtered.Rmd
The R package malbacR
is a new package that deals with
batch correction methods of small molecule omics data. It works in
conjunction with pmartR
, an R package designed for
preprocessing, filtering, and analyzing multi-omics data (Stratton et al. 2019; Degnan et al. 2023).
This outline demonstrates the functionality and use of the
malbacR
package. Within this package, there are four data
sets: pmart_amide
,
pmart_amideFilt
,pmart_mix
, and
pmart_mixFilt
. The first two objects contain data
originally found in the package WaveICA2.0
(Deng and Li 2022). The second two objects
contain data originally found in the package crmn
(Redestig et al. 2009). The data sets with the
suffix “Filt” are filtered versions of the original data sets. That is,
unlike their non-filtered counterparts, these data objects require no
additional data manipulation for batch correction to successfully run.
For this example, we work with the filtered data sets to demonstrate how
the malbacR
functions work.
Within malbacR
there are 12 batch correction
methods:
- Range Scaling (Berg et al. 2006)
- Power Scaling (Berg et al. 2006)
- Pareto Scaling (Berg et al. 2006)
- ComBat (Müller et al. 2016)
- EigenMS (Karpievitch et al. 2014)
- NOMIS (Sysi-Aho et al. 2007)
- RUV-random (De Livera et al. 2015)
- QC-RLSC (Dunn et al. 2011)
- WaveICA2.0 (Deng et al. 2019)
- TIGER (Han et al. 2022)
- SERRF (Fan et al. 2019)
- QC-RFSC (Luan et al. 2018)
Data Set 1: Amide Data
Load in the Data
A pmartR
friendly version of the “Amide” data that has
already undergone filtering and log2 transformations is already
implemented within malbacR
. Therefore, we simply load in
the data.
data(pmart_amideFilt)
Run Batch Correction Methods
The “Amide” data set contains information regarding quality control samples. Therefore the following methods can be used:
- Range Scaling
- Power Scaling
- Pareto Scaling
- TIGER
- EigenMS
- WaveICA2.0
- SERRF
- ComBat
- QCRLSC
- QCRFSC
It is important to note that the data pmart_amideFilt
has been filtered, transformed, and imputed such that the data runs with
all possible methods, but it may be conservative for a given method. For
example, some methods do not require imputation to run, but because some
methods require imputation (like SERRF and WaveICA2.0) (Fan et al. 2019; Deng et al. 2019), the data
has been imputed.
The data is currently on a log2 scale, therefore, the data must be converted to abundance values for the following methods: SERRF, TIGER, QC-RFSC, and WaveICA.
# SCALING METHODS
# range scaling
amide_range <- bc_range(omicsData = pmart_amideFilt)
# power scaling
amide_power <- bc_power(omicsData = pmart_amideFilt)
# pareto scaling
amide_pareto <- bc_pareto(omicsData = pmart_amideFilt)
# QUALITY CONTROL METHODS
# TIGER
pmart_amideFilt_abundance <- edata_transform(omicsData = pmart_amideFilt, data_scale = "abundance")
amide_tiger_abundance <- bc_tiger(omicsData = pmart_amideFilt_abundance,sampletype_cname = "group",
test_val = "QC",injection_cname = "Injection_order",group_cname = "group")
amide_tiger <- edata_transform(omicsData = amide_tiger_abundance, data_scale = "log2")
# QC-RLSC
amide_qcrlsc <- bc_qcrlsc(omicsData = pmart_amideFilt,block_cname = "batch",
qc_cname = "group", qc_val = "QC", order_cname = "Injection_order",
missing_thresh = 0.5, rsd_thresh = 0.3, backtransform = FALSE,keep_qc = FALSE)
# SERRF
amide_serrf_abundance <- bc_serrf(omicsData = pmart_amideFilt_abundance,sampletype_cname = "group",test_val = "QC",group_cname = "group")
amide_serrf <- edata_transform(omicsData = amide_serrf_abundance, data_scale = "log2")
# OTHER METHODS
# ComBat
amide_combat <- bc_combat(omicsData = pmart_amideFilt, use_groups = FALSE)
# EigenMS
amide_eigen <- bc_eigenMS(omicsData = pmart_amideFilt)
# WaveICA2.0
amide_wave_abundance <- bc_waveica(omicsData = pmart_amideFilt_abundance, injection_cname = "Injection_order",
version = "WaveICA2.0", alpha = 0, cutoff_injection = 0.1, K = 10,
negative_to_na = TRUE)
amide_wave <- edata_transform(omicsData = amide_wave_abundance, data_scale = "log2")
# QC-RFSC
amide_qcrfsc_abundance <- bc_qcrfsc(omicsData = pmart_amideFilt_abundance,qc_cname = "group",qc_val = "QC",
order_cname = "Injection_order",ntree = 500, keep_qc = FALSE,group_cname = "group")
amide_qcrfsc <- edata_transform(amide_qcrfsc_abundance, data_scale = "log2")
Data Visualization
After obtaining all the different batch corrected data sets, we can
plot the PPCA to see if they are successfully returning batch corrected
data. We set the group_designation
to use our batch
information as the main_effects
so as to color the data by
batch. All of this code is run using functions from pmartR
demonstrating the utility between the two packages.
p1 <- plot(dim_reduction(omicsData = pmart_amideFilt),omicsData = pmart_amide,color_by = "batch") + labs(title = "Amide: Unadjusted")
p2 <- plot(dim_reduction(omicsData = amide_range),omicsData = amide_range,color_by = "batch") + labs(title = "Amide: Range")
p3 <- plot(dim_reduction(omicsData = amide_power),omicsData = amide_power,color_by = "batch") + labs(title = "Amide: Power")
p4 <- plot(dim_reduction(omicsData = amide_pareto),omicsData = amide_pareto,color_by = "batch") + labs(title = "Amide: Pareto")
p5 <- plot(dim_reduction(omicsData = amide_tiger),omicsData = amide_tiger,color_by = "batch") + labs(title = "Amide: TIGER")
p6 <- plot(dim_reduction(omicsData = amide_qcrlsc),omicsData = amide_qcrlsc,color_by = "batch") + labs(title = "Amide: QC-RLSC")
p7 <- plot(dim_reduction(omicsData = amide_combat),omicsData = amide_combat,color_by = "batch") + labs(title = "Amide: ComBat")
p8 <- plot(dim_reduction(omicsData = amide_wave),omicsData = amide_wave,color_by = "batch") + labs(title = "Amide: WaveICA2.0")
p9 <- plot(dim_reduction(omicsData = amide_serrf),omicsData = amide_serrf,color_by = "batch") + labs(title = "Amide: SERRF")
p10 <- plot(dim_reduction(omicsData = amide_qcrfsc),omicsData = amide_qcrfsc,color_by = "batch") + labs(title = "Amide: QC-RFSC")
p11 <- plot(dim_reduction(omicsData = amide_eigen),omicsData = amide_eigen,color_by = "batch") + labs(title = "Amide: EigenMS")
p1;p2;p3;p4;p5;p6;p7;p8;p9;p10;p11
Data Set 2: Mix Data
Load in the Data
The malbacR
package includes another pmartR
friendly data set for the “mix” data which has already undergone
filtering and log2 transformations is already implemented within
malbacR
. Therefore, we simply load in the data.
data(pmart_mixFilt)
Run Batch Correction Methods
The “mix” data set contains information regarding negative controls/internal standards. Therefore the following methods can be used:
- Range Scaling
- Power Scaling
- Pareto Scaling
- RUV-random
- NOMIS
- ComBat
As there is no information regarding QC samples or injection order, the other batch correction methods in the package or not able to be used with this data set.
As with the pmart_amideFilt
data set, is important to
note that the data in pmart_mixFilt
has been filtered and
transformed such that the data runs with all possible methods, but it
may be conservative for a given method. As there were no missing data
with the “mix” data set, no imputation was computed.
The data is currently on a log2 scale, therefore, the data must be converted to abundance values for the following methods: NOMIS and WaveICA.
# SCALING METHODS
# range scaling
mix_range <- bc_range(omicsData = pmart_mixFilt)
# pareto scaling
mix_pareto <- bc_pareto(omicsData = pmart_mixFilt)
# power scaling
mix_power <- bc_power(omicsData = pmart_mixFilt)
# INTERNAL STANDARDS/NEGATIVE CONTROLS
# RUV-random
mix_ruv <- bc_ruvRandom(omicsData = pmart_mixFilt, nc_cname = "tag",nc_val = "IS", k = 3)
# NOMIS
pmart_mixFilt_abundance <- edata_transform(omicsData = pmart_mixFilt, data_scale = "abundance")
mix_nomis_abundance <- bc_nomis(omicsData = pmart_mixFilt_abundance, is_cname = "tag", is_val = "IS", num_pc = 2)
mix_nomis <- edata_transform(omicsData = mix_nomis_abundance, data_scale = "log2")
# OTHER METHODS
mix_combat <- bc_combat(omicsData = pmart_mixFilt)
mix_waveica_abundance <- bc_waveica(omicsData = pmart_mixFilt_abundance,batch_cname = "BatchNum",
version = "WaveICA",alpha = 0, K = 20, cutoff_batch = 0.05,
cutoff_group = 0.05,negative_to_na = TRUE)
## ######Decomposition 1 ########
## ######Decomposition 2 ########
## ######Decomposition 3 ########
## ######Decomposition 4 ########
## ######Decomposition 5 ########
## ######Decomposition 6 ########
## ######Decomposition 7 ########
## ######Decomposition 8 ########
## ######Decomposition 9 ########
## ######Decomposition 10 ########
## ######Decomposition 11 ########
## ######Decomposition 12 ########
## ######Decomposition 13 ########
## ######Decomposition 14 ########
## ######Decomposition 15 ########
## ######Decomposition 16 ########
## ######Decomposition 17 ########
## ######Decomposition 18 ########
## ######Decomposition 19 ########
## ######Decomposition 20 ########
## ######Decomposition 21 ########
## ######Decomposition 22 ########
## ######Decomposition 23 ########
## ######Decomposition 24 ########
## ######Decomposition 25 ########
## ######Decomposition 26 ########
## ######Decomposition 27 ########
## ######Decomposition 28 ########
## ######Decomposition 29 ########
## ######Decomposition 30 ########
## ######Decomposition 31 ########
## ######Decomposition 32 ########
## ######Decomposition 33 ########
## ######Decomposition 34 ########
## ######Decomposition 35 ########
## ######Decomposition 36 ########
## ######Decomposition 37 ########
## ######Decomposition 38 ########
## ######Decomposition 39 ########
## ######Decomposition 40 ########
## ######Decomposition 41 ########
## ######Decomposition 42 ########
## ######Decomposition 43 ########
## ######Decomposition 44 ########
## ######Decomposition 45 ########
## ######Decomposition 46 ########
## ######### ICA 1 #############
## [1] "Removing 0 components with P value less than 0.05"
## ######### ICA 2 #############
## [1] "Removing 10 components with P value less than 0.05"
## ######### ICA 3 #############
## [1] "Removing 13 components with P value less than 0.05"
## ######### ICA 4 #############
## [1] "Removing 11 components with P value less than 0.05"
## ######### ICA 5 #############
## [1] "Removing 19 components with P value less than 0.05"
## ######### ICA 6 #############
## [1] "Removing 1 components with P value less than 0.05"
## ######Reconstruction 1 ########
## ######Reconstruction 2 ########
## ######Reconstruction 3 ########
## ######Reconstruction 4 ########
## ######Reconstruction 5 ########
## ######Reconstruction 6 ########
## ######Reconstruction 7 ########
## ######Reconstruction 8 ########
## ######Reconstruction 9 ########
## ######Reconstruction 10 ########
## ######Reconstruction 11 ########
## ######Reconstruction 12 ########
## ######Reconstruction 13 ########
## ######Reconstruction 14 ########
## ######Reconstruction 15 ########
## ######Reconstruction 16 ########
## ######Reconstruction 17 ########
## ######Reconstruction 18 ########
## ######Reconstruction 19 ########
## ######Reconstruction 20 ########
## ######Reconstruction 21 ########
## ######Reconstruction 22 ########
## ######Reconstruction 23 ########
## ######Reconstruction 24 ########
## ######Reconstruction 25 ########
## ######Reconstruction 26 ########
## ######Reconstruction 27 ########
## ######Reconstruction 28 ########
## ######Reconstruction 29 ########
## ######Reconstruction 30 ########
## ######Reconstruction 31 ########
## ######Reconstruction 32 ########
## ######Reconstruction 33 ########
## ######Reconstruction 34 ########
## ######Reconstruction 35 ########
## ######Reconstruction 36 ########
## ######Reconstruction 37 ########
## ######Reconstruction 38 ########
## ######Reconstruction 39 ########
## ######Reconstruction 40 ########
## ######Reconstruction 41 ########
## ######Reconstruction 42 ########
## ######Reconstruction 43 ########
## ######Reconstruction 44 ########
## ######Reconstruction 45 ########
## ######Reconstruction 46 ########
mix_waveica <- edata_transform(omicsData = mix_waveica_abundance, data_scale = "log2")
Mix: Data Visualization
Similar to the previous data set, we can compare the PCA plots between the unadjusted and adjusted data sets.
p1 <- plot(dim_reduction(omicsData = pmart_mixFilt),omicsData = pmart_mixFilt,color_by = "BatchNum") + labs(title = "Mix: Unadjusted")
p2 <- plot(dim_reduction(omicsData = mix_ruv),omicsData = mix_ruv,color_by = "BatchNum") + labs(title = "Mix: ruv-Random")
p3 <- plot(dim_reduction(omicsData = mix_nomis),omicsData = mix_nomis,color_by = "BatchNum") + labs(title = "Mix: NOMIS")
p4 <- plot(dim_reduction(omicsData = mix_combat),omicsData = mix_combat,color_by = "BatchNum") + labs(title = "Mix: ComBat")
p5 <- plot(dim_reduction(omicsData = mix_range),omicsData = mix_range,color_by = "BatchNum") + labs(title = "Mix: Range")
p6 <- plot(dim_reduction(omicsData = mix_power),omicsData = mix_power,color_by = "BatchNum") + labs(title = "Mix: Power")
p7 <- plot(dim_reduction(omicsData = mix_pareto),omicsData = mix_pareto,color_by = "BatchNum") + labs(title = "Mix: Pareto")
p8 <- plot(dim_reduction(omicsData = mix_waveica),omicsData = mix_waveica,color_by = "BatchNum") + labs(title = "Mix: WaveICA")
p1;p2;p3;p4;p5;p6;p7;p8