In silico estimates of tissue components in surgical samples based on expression profiling data.


Tissue samples from many diseases have been used for gene expression profiling studies, but these samples often vary widely in the cell types they contain. Such variation could confound efforts to correlate expression with clinical parameters. In principle, the proportion of each major tissue component can be estimated from the profiling data and used to triage samples before studying correlations with disease parameters. Four large gene expression microarray data sets from prostate cancer, whose tissue components were estimated by pathologists, were used to test the performance of multivariate linear regression models for in silico prediction of major tissue components. Ten-fold cross-validation within each data set yielded average differences between the pathologists' predictions and the in silico predictions of 8% to 14% for the tumor component and 13% to 17% for the stroma component. Across independent data sets that used similar platforms and fresh frozen samples, the average differences were 11% to 12% for tumor and 12% to 17% for stroma. When the models were applied to 219 arrays of "tumor-enriched" samples in the literature, almost one quarter were predicted to have 30% or less tumor cells. Furthermore, there was a 10.5% difference in the average predicted tumor content between 37 recurrent and 42 nonrecurrent cancer patients. As a result, genes that correlated with tissue percentage generally also correlated with recurrence. If such a correlation is not desired, then some samples might be removed to rebalance the data set or tissue percentages might be incorporated into the prediction algorithm. A web service, "CellPred," has been designed for the in silico prediction of sample tissue components based on expression data.

  • Jia Z
  • McClelland M
  • Mercola D
  • Sawyers A
  • Wang Y
  • Wang-Rodriquez J
  • Xia XQ
  • Yao H
PubMed ID
Appears In
Cancer Res, 2010, 70 (16)