Frequently Asked Questions

General topics

GHGA is the German Human Genome-Phenome Archive. Started in late 2020, we are building a secure national omics data infrastructure that provides a secure platform for data sharing and secondary use of human genome data from research and clinical sequencing.

GHGA’s mission can be found here.

While the GHGA data portal presents the point of contact for the up- and download and analysis of omics data, the data is stored in federated manner, on servers at the GHGA Data Hubs.  

The GHGA Data Hubs are associated with German universities and research centers, co-located with major omics sequencing centers. Together, they operate as a federated network which shares joint standards and infrastructure under central management, coordinated via GHGA central represented by DKFZ.

GHGA strives to understand the expectations and concerns of patients. We have held deliberative forums to elicit their views on transparent and trustworthy governance. GHGA is in the process of setting up a patient advisory board within its governance structure for a continued exchange of ideas and perspectives on issues concerning omics data. We hope to make patients, not just their data, an integral part of GHGAs growth and development. 

GHGA is organized in a federated manner - under central management data is stored locally at the GHGA Data Hubs.

The data hubs provide - besides storage and compute infrastructure for GHGA services - significant resources, professional operations, data stewardship, technical security, and scalability to GHGA. One great benefit of this data hub network is that it will establish replication services across data hubs - perspectively providing geo-redundant storage and backup at each data hub.

In June 2022, GHGA, represented by DKFZ as the coordinating legal entity, signed a collaboration agreement with the European Genome-Phenome Archive (EGA) to become the German national node of the federated EGA. More here.

GDI aims to bring the mission of the 1+MG initiative into fruition, connecting national genomics data infrastructures to form a European genomic data network. GHGA is forming the German GDI node, thereby linking GHGA to the pan-European GDI infrastructure. More here.

The national initiative genomDE is tasked with the development of a concept to bring genome sequencing into standard health care while ensuring the best possible usage of the generated data for research purposes for the benefit of future patients. 

One building block is the creation of a secure and reliable infrastructure for genomic data, where research data can be securely archived and shared. Together with the partner institutions in the genomDE project, GHGA is developing technical and legal solutions for archival and secondary use of data from this national genomics initiative.

Using GHGA

The GHGA Data Portal is currently being developed and will allow data upload requests via a user interface in the future. We will notify the community as soon as this functionality is fully operational. 

During the GHGA Metadata Catalog phase of the project, we are working with selected partner institutions to pilot the collection of datasets. If you are located at a GHGA partner institute and would like to upload your metadata to the GHGA Metadata Catalog, please contact the GHGA Helpdesk.

Datasets available within GHGA can be browsed in the GHGA Metadata Catalog. After identifying datasets of interest, you can contact the controllers of the datasets using the displayed contact information. 

Once the access requests are approved by the controller, the data will be made available. Currently, the data sharing step relies on resources outside of GHGA, e.g. via existing local omics infrastructures. Data access will be managed by GHGA directly in the future.

GHGA is an archive for human omics data only, which can comprise sequencing raw data files, associated metadata files, or any type of human omics data. As such, most file formats will be supported.

Datasets to be submitted to GHGA will need to be described using the GHGA Metadata scheme, which is available on GitHub. Further details can be found in the GHGA Metadata Whitepaper on Zenodo. 

For data from non-human model systems that are not subject to controlled access, we advise users to use specialized archives such as European Nucleotide Archive (ENA). 

Please contact the GHGA helpdesk for data submissions and specific questions.

GHGA is implementing an ethical-legal framework tailored to the German legal landscape and interpretation of the GDPR - something other international initiatives cannot guarantee and that is particularly useful for German data producers. 

As GHGA is part of the federated EGA, data deposited within the german node is discoverable within fEGA. Hence, the utility and benefits of regular EGA submissions will be retained when depositing data in GHGA.

While data submissions to GHGA are not limited to German researchers or institutions, European researchers are encouraged to use the federal EGA node of their own country of residence. Within the fEGA, european datasets stored at the national nodes can be found in one central database.

As a partner of the federated European Genome-Phenome Archive (fEGA) and the European Genomics Data Infrastructure (GDI), data collected within GHGA will be findable internationally. 

In addition, through the use of common technical and metadata standards, we aim to ensure the interoperability of the collected data with data sets world wide. However, access to the data is still subject to the national implementation of GDPR.

Data security and protection

The prevention of data misuse is a primary objective in GHGA's mission. To ensure data safety, we take a layered approach. Our newly developed infrastructure will provide high level cybersecurity based on zero-trust networking, combined with technical and organizational measures - allowing data to be archived and shared safely. 

In addition, we have developed a framework for GDPR-compliant data processing and help data producers to inform patients and manage consent. Enabling controlled, yet FAIR, data access is the last layer to ensure data is protected while fulfilling its potential to advance research.

GHGA will not act as the controller or owner of any data deposited within GHGA. Instead, GHGA ats as a data processor that processes and shares data as instructed by the data controller.

GHGA is therefore only able to share data with other researchers based on the explicit approval of the institution which has submitted the data. It is the responsibility of the controller to decide on data access requests submitted via GHGA by other researchers who want to use the data stored within GHGA.

Only non-personal metadata is publicly available within the GHGA Data Portal. 

Researchers intending to use any of the archived data or view personal metadata must apply for access with the respective data controllers. 

A data access committee or a comparable instance will review the legitimacy of the request prior to granting access. This step ensures that only researchers with a valid research purpose gain access to sensitive data - adding another layer of protection.

GHGA has developed different tools to help users to submit omics data to GHGA in a legally and ethically sound manner.

We have developed modules to be included in existing or newly written consent forms and designed an app to help evaluate legacy consent forms. The tool can be used to guide researchers in assessing their pre-GDPR consent forms to see if it is sufficient to permit new data processing.