Interoperability Analysis of the GHGA Metadata Model

Interoperability between metadata models is becoming increasingly important as genomic data infrastructures grow and operate in federated networks. Our latest publication in Scientific Data examines how the GHGA Metadata Model compares to four other models in the domain.

The GHGA Metadata Model allows users to structure and standardise their metadata following a bottom-up experiment workflow while ensuring that all collected data is anonymous. Although its overall structure is similar to that of the European Genome-phenome Archive (EGA), it includes additional information about the studied person, attributes describing the experiment, and an explicit linkage between samples, research files and processed files.

To better understand how well the GHGA model aligns with existing standards, we performed semantic crosswalk analyses with four other relevant models. Rather than comparing overall structure design, we focused on the semantic meaning and content of the corresponding fields. By comparing the mapped properties across models, we identified a shared omics metadata consensus that also aligns with MINSEQE (the Minimum Information About a High-Throughput Sequencing Experiment), a widely recognized reporting standard.

We then reversed the comparison by mapping properties from the related models onto GHGA in order to determine whether relevant information might be missing. Our results show that the GHGA Metadata Model covers all required attributes of the compared models. While knowledge gaps are present, such as sequencing or sampling dates, they can be covered in the non-standardised part of the GHGA model.

Our study highlights the robustness, interoperability, and extensibility of the GHGA Metadata Model and demonstrates its compatibility with widely used models and established standards in genomic data sharing.