FAIR Data Submission Guidance for EDRN

(Draft)
Version: 1.0.1
Date: 2025-1-1

To align with the FAIR principles outlined in the EDRN Data Sharing Policy, the EDRN has developed a set of minimal requirements for submitting data to LabCAS, EDRN's biomarker data commons repository.

A primary goal of EDRN data collection is to ensure the reusability of the data by groups beyond those who originally collected it. Per the EDRN Data Sharing Policy, data must be made available for public use. This includes providing sufficient metadata and documentation to help users understand the data's configuration and structure. Below is guidance on the critical metadata that should accompany your data submission. Additionally, supplemental documents and readme files can be included to support the enhanced use of the data.

Core Metadata to Support Definition, Accessibility, and Structure of the Data

Metadata is critical to support the discoverability, interpretability, and usability of the data. LabCAS organizes data into Collections, Datasets, and Files, each with its own set of minimal metadata requirements. Additional metadata is also defined for various assay types. Those metadata are coordinated as Common Data Elements by research groups and should be added to increase the usability of the data.

Metadata for Collections, Datasets, and Files

This Metadata Check List details the required and optional metadata for Collections, Datasets, and Files for LabCAS data submission. For more comprehensive information, please refer to the EDRN Data Model.

De-Identification of Data

All data must be de-identified at your site before uploading to LabCAS. Additional guidance may be available in your studyโ€™s SOP.

For Reference Sets (LTP2, PMRI, and BRSI), follow the Imaging Data Transfer SOP provided by the DMCC.

As described in the Metadata Check List, De-identification Method (Safe Harbor, or Expert Review) is required metadata for data submission. Please refer to Health and Human Services - Methods for De-identification of PHI for more information.

Data to Upload

This section details what's required and optional when uploading data.

ReadMe File, Ancillary Data, Data Dictionaries and Other Information

Methodology details should be included as part of the metadata. You can also include supplemental information explaining the algorithms and computations applied to the raw data. Additional data, such as ancillary data or clinical records, may also be uploaded to LabCAS as supporting documentation. Examples of supporting files include:

Data Files

To support reproducibility and facilitate robust analyses by external researchers, each data submission should include the applicable core data components. Please note that summary or aggregated data alone is insufficient; raw data and relevant clinical data are essential for meaningful reanalysis.

File Inclusion Guidelines

When uploading files, include only those directly relevant to the study. For example, if you are uploading DICOM files from a CD-ROM, ensure only the DICOM files needed for analysis are included. Do not upload extraneous files such as software, image viewers, or other auxiliary content that may be included on the CD-ROM. Uploading unrelated files could create licensing issues or unnecessary data clutter.

Optional Files

You may optionally include a file with checksums to verify data integrity. This file can be in .csv format, where:

When to Include Checksums:

Organization of Files and Folders

When uploading data, you must arrange the filesystem hierarchy (the folders and files that contain your data) according to the following structure:

An example structure is shown below:

๐Ÿ“  collection
    ๐Ÿ“  CollectionLevel
         ๐Ÿ“„   ReadMe.txt
         ๐Ÿ“„   SOP 1.pdf
         ๐Ÿ“„   ClinicalData.csv
         ๐Ÿ“„   DataDictionary.csv
    ๐Ÿ“  dataset 1
        ๐Ÿ“‚   participant 1
            ๐Ÿ“  (optional nested datasets)
                ๐Ÿ“„   file 1
                ๐Ÿ“„   file 2
                ๐Ÿ“„   file 3                        
                . . .
        ๐Ÿ“‚   participant 2
            ๐Ÿ“  (optional nested datasets)
                ๐Ÿ“„   file 1
                ๐Ÿ“„   file 2
                ๐Ÿ“„   file 3                        
                . . .
    ๐Ÿ“  dataset 2
        ๐Ÿ“‚   participant 1
            ๐Ÿ“  (optional nested datasets)
                ๐Ÿ“„   file 1
                ๐Ÿ“„   file 2
                ๐Ÿ“„   file 3                        
                . . .
            ๐Ÿ“„   file n
            ๐Ÿ“„   file n+1
            ๐Ÿ“„   file n+2
    . . .

As mentioned before, do not include viewer software, AUTORUN.INF files, .exe files, .app folders, .DLL files, LICENSE files, Java files, etc.

Your data must also be self-contained; this means that each file contains all of the data needed to describe itself (aside from the metadata you submit separately). Examples:

You may package your filesystem hierarchy into an archive. We can accept .zip, .tar, .tar.gz, .tar.bz2, and .tar.xz files.

Review and Verification

Validation of required metadata will be performed during the upload of data to LabAS. In addition, each site and/or assigned domain expert, must review and validate their data to ensure accuracy in data capture and usability by others in LabCAS. This is a critical step in ensuring that they can be shared and used by other research groups.