Converts several data frames of RNA-seq transcript data to an object of the class 'seqData'. Objects of the class 'seqData' are lists with two obligatory components, e_data and f_data. An optional list component, e_meta, is used if analysis or visualization at other levels (e.g. gene, protein, pathway) is also desired.

as.seqData(
  e_data,
  f_data,
  e_meta = NULL,
  edata_cname,
  fdata_cname,
  emeta_cname = NULL,
  techrep_cname = NULL,
  ...
)

Arguments

e_data

a \(p \times n + 1\) data frame of expression data, where \(p\) is the number of RNA transcripts observed and \(n\) is the number of samples (an additional transcript identifier/name column should also be present somewhere in the data frame). Each row corresponds to data for one transcript. One column specifying a unique identifier for each transcript (row) must be present. All counts are required to be raw for processing.

f_data

a data frame with \(n\) rows. Each row corresponds to a sample with one column giving the unique sample identifiers found in e_data column names and other columns providing qualitative and/or quantitative traits of each sample. For library size normalization, this can be provided as part of f_data or calculated from columns in e_data.

e_meta

an optional data frame with at least \(p\) rows. Each row corresponds to a transcript with one column giving transcript names (must be named the same as the column in e_data) and other columns giving biomolecule meta information (e.g. mappings of transcripts to genes or proteins). Can be the same as edata_cname, if desired.

edata_cname

character string specifying the name of the column containing the transcript identifiers in e_data and e_meta (if applicable).

fdata_cname

character string specifying the name of the column containing the sample identifiers in f_data.

emeta_cname

character string specifying the name of the column containing the gene identifiers (or other mapping variable) in e_meta (if applicable). Defaults to NULL. If e_meta is NULL, then either do not specify emeta_cname or specify it as NULL.

techrep_cname

character string specifying the name of the column in f_data that specifies which samples are technical replicates. This column is used to collapse the data when combine_techreps is called on this object. Defaults to NULL (no technical replicates).

...

further arguments

Value

Object of class seqData

Details

Objects of class 'seqData' contain some attributes that are referenced by downstream functions. These attributes can be changed from their default value by manual specification. A list of these attributes as well as their default values are as follows:

data_scaleScale of the data provided in e_data. Only 'counts' is valid for 'seqData'.
is_normalizedA logical argument, specifying whether the data has been normalized or not. Default value is FALSE.
norm_infoDefault value is an empty list, which will be populated with a single named element is_normalized = is_normalized.
data_typesCharacter string describing the type of data, most commonly used for lipidomic data (lipidData objects) or NMR data (nmrData objects) but available for other data classes as well. Default value is NULL.

Computed values included in the data_info attribute are as follows:

num_edataThe number of unique edata_cname entries.
num_zero_obsThe number of zero-value observations.
num_emetaThe number of unique emeta_cname entries.
prop_missingThe proportion of e_data values that are NA.
num_sampsThe number of samples that make up the columns of e_data.
meta_infoA logical argument, specifying whether e_meta is provided.

Author

Rachel Richardson, Kelly Stratton, Lisa Bramer

Examples

library(pmartRdata)
myseq <- as.seqData(
  e_data = rnaseq_edata,
  e_meta = rnaseq_emeta,
  f_data = rnaseq_fdata,
  edata_cname = "Transcript",
  fdata_cname = "SampleName",
  emeta_cname = "Transcript"
)