Data Analysis Tools

The GHGA bioinformatics team is making steady progress toward delivering standardised, comparable, and reproducible omics workflows for the research community. To streamline development and maximise reuse, the team focuses on improving existing workflows rather than creating new ones, while closely aligning with the nf-core  community and its curated best-practice analytical pipelines. Below, we highlight workflow releases to which GHGA has contributed.

If you are starting to use nf-core workflows, we recommend Bytesize - a video tutorial series published by nf-core that introduces workflow implementation and nf-core workflows in short and easy to follow snippets!

In addition, we encourage you to explore the GHGA Webinar Series: Sequencing Techniques and Analysis, which offers a collection of beginner-friendly videos on topics such as DNA and single-cell sequencing, data visualisation, and bioinformatic analysis workflows. For more hands-on learning, we also suggest the NGS Harmonization Workshop , which provides comprehensive training materials that can be followed in a self-paced manner to support mastering standardised workflow development.

Data Quality Control Tools

Bioinformatics depends on both analytical tools and precise metadata. To assess raw data quality and verify its alignment with associated metadata, GHGA Data Hubs utilise the BfArM-MVH/GRZ_QC_Workflow. 

GRZ_QC_Workflow

This workflow is designed to compute quality metrics as required by BfArM for genome data centers (GRZs).

Mehr erfahren

Germline Variant Calling and Annotation

To standardise processing, GHGA suggests the use of nf-core/sarek pipeline for germline variant calling and annotation. This enables the consistent analysis of common and rare variants, which are subsequently aggregated into a variant frequency database. This resource supports rare disease diagnostics by establishing robust frequency benchmarks for healthy and affected individuals.

Logo for nf-core sarek: green geometric shape inside a white circle, surrounded by a black circle

sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing

Mehr erfahren

Somatic Variant Calling

As part of its commitment to advancing omics-driven cancer research and diagnostics, GHGA supports a growing suite of workflows for somatic variant calling, like the new GHGA developed nextflow pipelines translated from Roddy based DKFZ somatic calling pipelines. While genomic sequencing remains crucial for identifying somatic mutations, a significant challenge is interpreting variants of unknown significance (VUS), particularly missense and non-coding variants.

Icon symbolising DNA analysis workflows: orange double helix, surrounded by a dark green semicircular arc

nf-platypusindelcalling

Nextflow-based pipeline to call and prioritise somatic indels with extensive quality control and filtering steps. Redeveloped from the original Roddy pipeline used in Pan-cancer analysis of whole genomes study.

Mehr erfahren
Icon symbolising DNA analysis workflows: orange double helix, surrounded by a dark green semicircular arc

nf-snvcalling

Nextflow pipeline to call and prioritise somatic single nucleotide variations with filtering, annotations, and plots. Redeveloped from the original Roddy pipeline used in Pan-cancer analysis.

Mehr erfahren
Icon symbolising DNA analysis workflows: orange double helix, surrounded by a dark green semicircular arc

nf-aceseq

Nextflow pipeline to estimate allele-specific copy numbers from human Whole Genome Sequencing data (>30X). Redeveloped from the original Roddy pipeline.

Mehr erfahren
Icon symbolising DNA analysis workflows: orange double helix, surrounded by a dark green semicircular arc

nf-cnvkitcalling

Nextflow pipeline to determine copy numbers from human Whole Exome Sequencing data. Redeveloped from the original Roddy pipeline.

Mehr erfahren
Icon symbolising DNA analysis workflows: orange double helix, surrounded by a dark green semicircular arc

ExpansionHunter

A bioinformatics pipeline to analyse Short Tandem Repeats (STRs) using a combination of Expansion Hunter, samtools, and Stranger.

Mehr erfahren

Variant Benchmarking

GHGA focuses on variant benchmarking to ensure accurate, reproducible, and reliable variant detection, which is crucial for cancer diagnosis, treatment, and genomics research. To benchmark germline and somatic variants, GHGA has co-developed tools such as the nf-core variant benchmarking pipeline pipeline and NCBENCH. These tools provide transparent, standardised benchmarking results. This will improve variant detection reliability across various genomic contexts and contribute to more accurate diagnostics and therapeutic developments. 

variantbenchmarking

Pipeline to evaluate and validate the accuracy of variant calling methods in genomic research. The workflow provides benchmarking tools for small variants including SNVs and INDELs, Structural Variants (SVs) and Copy Number Variations (CNVs).

Mehr erfahren
Icon symbolising DNA analysis workflows: orange double helix, surrounded by a dark green semicircular arc

NCBench

A dedicated open source continuous integration platform for comparing performance metrics for small variants across standard datasets like GIAB and CHM-eval.

Mehr erfahren

RNA-Seq analysis

RNA-seq plays a crucial role in understanding disease mechanisms and advancing personalized medicine. GHGA supports RNA-seq efforts by helping define standardized nf-core workflows and configurations. To address this, GHGA provides tools like nf-core based nf-core/rnaseq pipeline and develops nf-core/drop pipeline to integrate RNA-seq and proteomics data. This helps reclassify variants of unknown significance (VUS) by detecting aberrant expression and splicing patterns. This multi-omics approach enhances the biological understanding of variants and supports more accurate cancer driver gene prediction.

Icon symbolising RNA analysis workflows: green RNA strand, surrounded by an orange semicircular arc

rnaseq

A bioinformatics pipeline that can be used to analyse RNA sequencing data. It takes FASTQ files as input, performs quality control (QC), trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report.

Mehr erfahren
Logo for DROP: hexagonal shape with purple border, showing the word ‘drop’ formed by RNA sequence elements and base paring structures

drop

Detection of RNA Outliers Pipeline is a bioinformatics pipeline that detects aberrant expression, aberrant splicing, and mono-allelic expression from RNA sequencing data.

Mehr erfahren

Omics Analysis

Under the umbrella of omics analysis, GHGA is developing standardised workflows for the processing, quality control, and curation of advanced data types, including single-cell, spatial, and multi-omics data. This includes nf-core workflows for spatial transcriptomics (nf-core/spatialxe), single-cell variant detection, and harmonised metadata standards. These pipelines will enable deeper insights into tissue heterogeneity, improve rare disease diagnostics, and advance cancer research by revealing resistance mechanisms and cell-type-specific mutations.

Icon symbolising RNA analysis workflows: green RNA strand, surrounded by an orange semicircular arc

scrnaseq

Single-cell RNA-Seq pipeline for barcode-based protocols such as 10x, DropSeq or SmartSeq, offering a variety of aligners and empty-droplet detection

Mehr erfahren
Icon symbolising RNA analysis workflows: green RNA strand, surrounded by an orange semicircular arc

spatialxe

A bioinformatics best-practice processing and quality control pipeline for Xenium data.

Mehr erfahren

Subscribe here!

To stay up to date with the latest workflow developments please sign up for our newsletter.