EDRN Data Sharing and Informatics Subcommittee 2025/12/15
This meeting was online but already occurred.
The focus of this call is a presentation from Elucidata on Federated Learning--abstract below:
Title: When observations don’t match patterns: A Data-Centric AI Framework for Out Of Distribution problems in Life Sciences. Abstract Standard AI relies on the "IID assumption" (Independent and Identically Distributed): that the data they face in the real world matches the distribution they learned during training.Understandably, in the real world, this assumption often breaks. Models inevitably encounter "Out-of-Distribution" (OOD) data unseen signals that defy established patterns. In consumer apps, missing a movie recommendation is a trivial error. However, in various usages, missing predictions on OOD scenarios can have significant adverse results. Self-driving cars, healthcare, life sciences and finance among other verticals are all increasingly dependent on OOD observations that are not well served by traditional AI approaches. In life sciences, a failure to adequately respond to OOD datapoints can have profound consequences.These outliers - such as a patient not responding to treatment, or an unexpected drug response - are rarely noise. They are often the most valuable signals driving the next generation of innovation. We propose a Data-Centric AI approach to capture these signals. Traditional AI approaches that rely on scaling up model size fail to solve the OOD challenges found in the "long tail" of scientific discovery. The Data-centric AI framework relies on three pillars to ensure reliable performance of the AI model beyond the training set: ● High-Quality Data Infrastructure backed by clean, linked data: Fueling models with context-rich data improves performance significantly more than larger architectures, particularly for rare, small-sample problems. ● Federated Learning: Decentralized training allows models to learn from diverse, private datasets. This ensures they are generalizable enough to handle distributional shifts. ● Physics-Based Rules: Integrating first principles allows models to reason beyond their training corpus. Similar to how AlphaFold uses geometric constraints, this approach helps AI navigate the "wild" of experimental biology. Data-centric AI approach makes context-rich data the hero, and demonstrates that this can deliver on the promise of AI for high value applications.
When
- Los Angeles
- Dec. 15, 2025, 10 a.m.
- Denver
- Dec. 15, 2025, 11 a.m.
- Chicago
- Dec. 15, 2025, noon
- New York
- Dec. 15, 2025, 1 p.m.
- UTC
- Dec. 15, 2025, 6 p.m.