Skip to contents

This function provides an advanced option to select metabolite variables from external dataset(s). The selected variables (as a list) can be further passed to argument selectVar_external in function run_TIGER for a customised data correction.


  test_num = NULL,
  train_batchID = NULL,
  test_batchID = NULL,
  selectVar_corType = c("cor", "pcor"),
  selectVar_corMethod = c("spearman", "pearson"),
  selectVar_minNum = 5,
  selectVar_maxNum = 10,
  selectVar_batchWise = FALSE,
  coerce_numeric = FALSE



a numeric data.frame only including the metabolite values of training samples (can be quality control samples). Information such as injection order or well position need to be excluded. Row: sample. Column: metabolite variable. See Examples.


an optional numeric data.frame including the metabolite values of test samples (can be subject samples). If provided, the column names of test_num should correspond to the column names of train_num. Row: sample. Column: metabolite variable. If NULL, the variables will be selected based on train_num only. See Examples.


NULL or a vector corresponding to train_num to specify the batch of each sample. Ignored if selectVar_batchWise = FALSE. See Examples.


NULL or a vector corresponding to test_num to specify the batch of each sample. Ignored if selectVar_batchWise = FALSE. See Examples.


a character string indicating correlation ("cor", default) or partial correlation ("pcor") is to be used. Can be abbreviated. See Details. Note: computing partial correlations of a large dataset can be very time-consuming.


a character string indicating which correlation coefficient is to be computed. One of "spearman" (default) or "pearson". Can be abbreviated. See Details.


an integer specifying the minimum number of the selected variables. If NULL, no limited, but 1 at least. See Details. Default: 5.


an integer specifying the maximum number of the selected variables. If NULL, no limited, but ncol(train_num) - 1 at most. See Details. Default: 10.


(advanced) logical. Specify whether the variable selection should be performed based on each batch. Default: FALSE. Note: if TRUE, batch ID of each sample are required. The support of batch-wise variable selection is provided for data requiring special processing (for example, data with strong batch effects). But in most case, batch-wise variable selection is not necessary. Setting TRUE might make the algorithm less robust. See Details.


logical. If TRUE, values in train_num and test_num will be coerced to numeric before the computation. The columns cannot be coerced will be removed (with warnings). See Examples. Default: FALSE.


the function returns a list of length one containing the selected variables computed on the whole dataset.