Skip to contents

This function provides an advanced option to select metabolite variables from external dataset(s). The selected variables (as a list) can be further passed to argument selectVar_external in function run_TIGER for a customised data correction.

Usage

select_variable(
  train_num,
  test_num = NULL,
  train_batchID = NULL,
  test_batchID = NULL,
  selectVar_corType = c("cor", "pcor"),
  selectVar_corMethod = c("spearman", "pearson"),
  selectVar_minNum = 5,
  selectVar_maxNum = 10,
  selectVar_batchWise = FALSE,
  coerce_numeric = FALSE
)

Arguments

train_num

a numeric data.frame only including the metabolite values of training samples (can be quality control samples). Information such as injection order or well position need to be excluded. Row: sample. Column: metabolite variable. See Examples.

test_num

an optional numeric data.frame including the metabolite values of test samples (can be subject samples). If provided, the column names of test_num should correspond to the column names of train_num. Row: sample. Column: metabolite variable. If NULL, the variables will be selected based on train_num only. See Examples.

train_batchID

NULL or a vector corresponding to train_num to specify the batch of each sample. Ignored if selectVar_batchWise = FALSE. See Examples.

test_batchID

NULL or a vector corresponding to test_num to specify the batch of each sample. Ignored if selectVar_batchWise = FALSE. See Examples.

selectVar_corType

a character string indicating correlation ("cor", default) or partial correlation ("pcor") is to be used. Can be abbreviated. See Details. Note: computing partial correlations of a large dataset can be very time-consuming.

selectVar_corMethod

a character string indicating which correlation coefficient is to be computed. One of "spearman" (default) or "pearson". Can be abbreviated. See Details.

selectVar_minNum

an integer specifying the minimum number of the selected variables. If NULL, no limited, but 1 at least. See Details. Default: 5.

selectVar_maxNum

an integer specifying the maximum number of the selected variables. If NULL, no limited, but ncol(train_num) - 1 at most. See Details. Default: 10.

selectVar_batchWise

(advanced) logical. Specify whether the variable selection should be performed based on each batch. Default: FALSE. Note: if TRUE, batch ID of each sample are required. The support of batch-wise variable selection is provided for data requiring special processing (for example, data with strong batch effects). But in most case, batch-wise variable selection is not necessary. Setting TRUE might make the algorithm less robust. See Details.

coerce_numeric

logical. If TRUE, values in train_num and test_num will be coerced to numeric before the computation. The columns cannot be coerced will be removed (with warnings). See Examples. Default: FALSE.

Value

the function returns a list of length one containing the selected variables computed on the whole dataset.