Minutes to EDRN Data Sharing and Informatics Subcommittee 9/16/2024

Data Sharing and Informatics Subcommittee

Monday, September 16, 2024

Present (in BOLD):

Current Action Items:

  1. JPL to 1) populate the grid of Roles and Responsibilities for FAIR-based data presented on the call to the Public Portal, 2) document more, and 3) promote investigator trainings to ensure that NCI policies are followed
  2. JPL to follow-up with lead PI of each project listed in LabCAS Holdings Nov 2023 spreadsheet rows 25-35 to determine if the projects can be considered public.

Agenda/Discussion:

Data sharing and federated learning: Dr. Eugene Koay and Dr. Michael Rosenthal joined the call. They are leading an EDRN pancreatic cancer study which uses a federated learning system. They want to make sure that the project is compliant with NCI guidelines. Dan Crichton pulled up the EDRN Data Policy—which indicates that all EDRN studies must have a DSP (Data Sharing Plan), which describes how investigators will manage and share data during the funding period which includes information on data storage, access polices/procedures, preservation, metadata standards and distribution approaches. The raw data from the study needs to be accessible at the time of publication of study results. Dan Crichton suggested coming up with language that addresses federated learning. Michael Rosenthal said that because the data sets involved in federated learning are so large, that it would be good not to include these projects in the mandate that the data be publicly available. What resource would be citable after the study results come out? Dr. Rosenthal suggested that aggregated data would be used, and there are models that are used, but they must be published in a way that removes sensitive information from being published. One challenge with federated learning is that it is hard to reproduce subject-level data outside of the federated learning model. NCI is considering a comprehensive cancer center-based federated network, and perhaps federated data could be accessible online there. One issue is how to deal with IP around algorithms. Christos Patriotis reminded everyone that the data has to be provided in a way that is useable and sharable, and perhaps adjustments to the policy can be made, but once an investigator has signed on, they must follow the policy. The IP will need to be negotiated. Michael Rosenthal said that the data for this study is retrospective and wasn’t collected using NCI funds. Christos Patriotis suggested inviting someone from NCI’s Technology Transfer Office to discuss this issue. Guillermo Marquez noted that because the EDRN is funded per cycle, some sites fall out, and how will this impact the ability to access the data and the site’s ability to participate—this could be discussed with OGA. Eugene Koay expressed concern that one company’s federated learning software may not be compatible with NCI’s or another company’s, and this can be a problem. The team can follow up with OGA and Tech Transfer to meet and discuss this further.

Data Collection Submission Compliance: NCI has new requirements for reporting on data collection. JPL is expected to report quarterly on data holdings in EDRN. The status of the data will be reported including compliance to EDRN data sharing policy and standards. Data that is non-compliant will be improved or returned to the Principal Investigator so that it won’t be publicly available. JPL is working on a Data Sharing Collection Submission Compliance that they will send to the EDRN and posted on the Public Portal. Discussed the metadata that will be tracked.

Other Business: At future meetings, the group will address:

Next Call: Monday, November 18th at 1pm Eastern/10am Pacific