The potential of genes and other markers to inform about risk.
Advances in biotechnology have raised expectations that biomarkers, including genetic profiles, will yield information to accurately predict outcomes for individuals. However, results to date have been disappointing. In addition, statistical methods to quantify the predictive information in markers have not been standardized.
We discuss statistical techniques to summarize predictive information, including risk distribution curves and measures derived from them, that relate to decision making. Attributes of these measures are contrasted with alternatives such as receiver operating characteristic curves, R(2), percent reclassification, and net reclassification index. Data are generated from simple models of risk conferred by genetic profiles for individuals in a population. Statistical techniques are illustrated, and the risk prediction capacities of different risk models are quantified.
Risk distribution curves are most informative and relevant to clinical practice. They show proportions of subjects classified into clinically relevant risk categories. In a population in which 10% have the outcome event and subjects are categorized as high risk if their risk exceeds 20%, we identified some settings where more than half of those destined to have an event were classified as high risk by the risk model. Either 150 genes each with odds ratio of 1.5 or 250 genes each with odds ratio of 1.25 were required when the minor allele frequencies are 10%. We show that conclusions based on receiver operating characteristic curves may not be the same as conclusions based on risk distribution curves.
Many highly predictive genes will be required to identify substantial numbers of subjects at high risk.
- Gu JW
- Morris DE
- Pepe MS