Accurate molecular classification of human cancers based on gene expression using a simple classifier with a pathological tree-based framework.


Recent studies suggest accurate prediction of tissue of origin for human cancers can be achieved by applying sophisticated statistical learning procedures to gene expression data obtained from DNA microarrays. We have pursued the hypothesis that a more straightforward and equally accurate strategy for classifying human tumors is to use a simple algorithm that considers gene expression levels within a tree-based framework that encodes limited information about pathology and tissue ontogeny. By considering gene expression data within this framework, we found only a small number of genes were required to achieve a relatively high accuracy level in tumor classification. Using as few as 45 genes we were able to classify 157 of 190 human malignant tumors correctly, which is comparable to previous results obtained with sophisticated classifiers using thousands of genes. Our simple classifier accurately predicted the origin of metastatic tumors even when the classifier was trained using only primary tumors, and the classifier produced accurate predictions when trained and tested on expression data from different labs, and from different microarray platforms. Our findings suggest that accurate and robust cancer diagnosis from gene expression profiles can be achieved by mimicking the classification strategies routinely used by surgical pathologists.

  • Beer DG
  • Cho KR
  • Fearon ER
  • Giordano TJ
  • Greenson JK
  • Gruber SB
  • Hanash S
  • Kardia SL
  • Kuick R
  • Logsdon C
  • Misek DE
  • Rennert G
  • Schwartz DR
  • Shedden KA
  • Simeone D
  • Taylor JM
PubMed ID
Appears In
Am J Pathol, 2003, 163 (5)