Improving the quality of biomarker discovery research: the right samples and enough of them.


Biomarker discovery research has yielded few biomarkers that validate for clinical use. A contributing factor may be poor study designs.

The goal in discovery research is to identify a subset of potentially useful markers from a large set of candidates assayed on case and control samples. We recommend the PRoBE design for selecting samples. We propose sample size calculations that require specifying: (i) a definition for biomarker performance; (ii) the proportion of useful markers the study should identify (Discovery Power); and (iii) the tolerable number of useless markers amongst those identified (False Leads Expected, FLE).

We apply the methodology to a study of 9,000 candidate biomarkers for risk of colon cancer recurrence where a useful biomarker has positive predictive value ≥ 30%. We find that 40 patients with recurrence and 160 without recurrence suffice to filter out 98% of useless markers (2% FLE) while identifying 95% of useful biomarkers (95% Discovery Power). Alternative methods for sample size calculation required more assumptions.

Biomarker discovery research should utilize quality biospecimen repositories and include sample sizes that enable markers meeting prespecified performance characteristics for well-defined clinical applications to be identified.

The scientific rigor of discovery research should be improved.

  • Feng Z
  • Li CI
  • Pepe MS
PubMed ID
Appears In
Cancer Epidemiol Biomarkers Prev, 2015, 24 (6)