MS-PyCloud: A Cloud Computing-Based Pipeline for Proteomic and Glycoproteomic Data Analyses.

Abstract

Rapid development and wide adoption of mass spectrometry-based glycoproteomic technologies have empowered scientists to study proteins and protein glycosylation in complex samples on a large scale. This progress has also created unprecedented challenges for individual laboratories to store, manage, and analyze proteomic and glycoproteomic data, both in the cost for proprietary software and high-performance computing and in the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI), for proteomic and glycoproteomic data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignments to peptide sequences, false discovery rate estimation, protein inference, quantitation of global protein levels, and specific glycan-modified glycopeptides as well as other modification-specific peptides such as phosphorylation, acetylation, and ubiquitination. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open-source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at https://github.com/huizhanglab-jhu/ms-pycloud.

EDRN PI Authors
  • (None specified)
Medline Author List
  • Chen L
  • Hoang T
  • Hu Y
  • Lih TM
  • Schnaubelt M
  • Zhang B
  • Zhang H
  • Zhang Z
PubMed ID
Appears In
Anal Chem, 2024 Jun (issue 25)