Final algorithm will include proteins from this list:
AMPM2, Apo A-I, b-ECGF, BLC, BMP-1, BTK, C1s, C9, Cadherin E, Cadherin-6, Calpain I, Catalase, CATC, Cathepsin H ,CD30 Ligand, CDK5-p35, CK-MB, CNDP1, Contactin-5, CSK, Cyclophilin A, Endostatin, ERBB1, FGF-17, FYN, GAPDH liver, HMG-1, HSP 90a, HSP 90b, IGFBP-2, IL-15 Ra, IL-17B, IMB1, Kallikrein 7, KPCI, LDH-H 1, LGMN, LRIG3, Macrophage mannose receptor, MEK1, METAP1, Midkine, MIP-5, MK13, MMP-7, NACA, NAGK, PARC, Proteinase-3, Prothrombin, PTN, RAC1, Renin, RGM-C, SCFsR, sL-Selectin, TCTP, UBE2N, Ubiquitin+1, VEGF, YES
Lung and Upper Aerodigestive Cancers Research Group
Analysis of the EDRN lung biomarker samples will enable the next step in development of a clinical diagnostic assay for the differential diagnosis of lung cancer or benign disease in patients with indeterminant pulmonary nodules. The objective is to develop a blood test with sufficient accuracy to identify lung cancer in these patients, which will aid the treating physician in decisions to recommend invasive procedures or watchful waiting.
To use the EDRN lung biomarker serum set C to
1. Validate the biomarkers discovered in the train and verification studies
2. Refine the classifier for detection of lung cancer in individuals with indeterminant pulmonary nodules.
This section describes SomaLogic’s statistical methods for biomarker discovery and classifier construction. These methods were applied to the data presented above, and similar approaches will be used with the EDRN samples. If our application is approved, the EDRN samples would be run on the SLaptamer array (approximately 800 analytes) and analysis completed by June 2009. The samples would then be run on subsequent larger arrays as they are developed. SomaLogic has experienced bioinformatics and statistics experts on staff as well as clinical biostatistics consultants. SomaLogic would nonetheless welcome an opportunity to collaborate with the extensive network of EDRN biostatisticians to learn new ways to identify and characterize potential biomarkers, and develop and optimize classifier algorithms for specific clinical applications.
All sample information, including sample receipt, clinical data and assay data will be captured, linked and stored in the SomaLogic laboratory information management LIMS system. Statistical analysis will be conducted using parametric and non-parametric models constructed and implemented in java running in a windows environment. Back-ups of all project data are conducted daily, and all back-up data is stored both locally as well as at a SomaLogic redundant off-site location.
There are two primary statistical problems to be solved in the proposed study: (i) identification of biomarkers that discriminate among the sample classes, and (ii) the construction of classifiers that allow for a new sample to be evaluated. The identification of statistically significant differential protein expression between the control samples and the disease samples will be performed using both parametric and nonparametric tests. The Kolmogorov-Smirnov (K-S) test is a non-parametric test that can be applied to marker identification. Similarly, Student’s t-test and an F-test can be used for marker discovery when data are normally distributed. Several classification methods will be applied to the datasets, including a parametric Bayesian classification scheme, linear discriminate analysis, and a nearest-neighbor classification scheme. The sample sets will be divided into train and test sets to assess the quality of the classifiers.