Gene expression cancer RNA sequence
Gene expression cancer RNA sequence#
This is a classification data set that comes with the NeurEco installation. It is a collection of data that is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions (giving \(20531\) input features), of patients having different types of tumors (\(5\) output features): BRCA, KIRC, COAD, LUAD and PRAD. Each input is given a dummy name (gene_xx), while the targets are the cancer classes: BRCA, KIRC, COAD, LUAD and PRAD.
The test case is provided with the following files:
Training data set:
x_train_0.csv: the training inputs file - part 1, containing \(320\) samples
y_train_0.csv: the training targets file - part 1
x_train_1.csv: the training inputs file - part 2, containing \(320\) samples
y_train_1.csv: the training targets file - part 2
testing data set:
x_test.csv: the testing inputs file, containing \(161\) samples
y_test.csv: the testing targets file