HUPO-PSI Spring Meeting · Rome · 4–8 May 2026
MetabolomicsHub mark
MetabolomicsHubmetabolomicshub.org
QR code to metabolomicshub.org

International data exchange and
data representation standards
for metabolomics.

Public metabolomics data are deposited across repositories with divergent metadata conventions, blocking cross-resource discovery, meta-analysis and AI-ready reuse. MetabolomicsHub is a new international consortium building a unified, FAIR-compliant framework for discovery, exchange and reuse of public metabolomics data — delivered through shared standards, a common data model and a centralised portal.

Authors
Jonathan E. Hunter1,†; Ozgur Yurekten1,†; Thomas Payne1; Naveen Raj Kookkal Polliapram1; Callum Martin1; Eoin Fahy2; Mano Maurya2; Srinivasan Ramachandran2; Brian C. DeFelice3; Carlos G. Gonzalez3; Joshua E. Elias3; Nils Hoffmann4; Yasin El Abiead5,6; Pieter C. Dorrestein5; Shankar Subramaniam2; Juan Antonio Vizcaíno1,†

1EMBL-EBI, Wellcome Genome Campus, Hinxton, UK

2Bioengineering & San Diego Supercomputer Center, UC San Diego, CA

3CZ Biohub, San Francisco, CA

Attending the HUPO-PSI Spring Meeting, Rome 2026

4Forschungszentrum Jülich GmbH, Bielefeld, Germany

5Skaggs School of Pharmacy, UC San Diego, CA

6BOKU University, Institute of Analytical Chemistry, Vienna, Austria

MetabolomicsHub
6,500+
Legacy Datasets
indexed across consortium
Portal & API
Unified Discovery
central search & filtering · cross-repository facets
Open standards
Adoption
promote mzTab-M & Universal Spectrum Identifiers (USIs)
Following
Established models
ProteomeXchange · INSDC · wwPDB
Common Data Model (MHD)
MHD12345
Accession space
global, persistent IDs
MHD
Common data model
JSON · graph-enabled · AI-ready · extensible
Legacy + MHD
Profiles
with validation rules
Ontology terms
Validated
validated via the OLS4 API
Founding repositories · v0.1 launch June 2026
MetaboLightsMetaboLights
MetaboLights · EMBL-EBI
Metabolomics Workbench
Metabolomics Workbench · UCSD
GNPSMassIVE
GNPS / MassIVE · UCSD
+ future partners
MetaboLights
Coordinating repository ebi.ac.uk/metabolights
EMBL-EBI
METABOLOMICSHUB · HUPO-PSI ROME 2026 · 1 / 4
Introduction

Standardising open data practices
in metabolomics

Public metabolomics datasets are growing rapidly, yet remain fragmented across repositories with divergent metadata conventions. MetabolomicsHub is a new international consortium building a unified, FAIR-compliant (Findable, Accessible, Interoperable, Reusable) infrastructure for the discovery, exchange and reuse of public metabolomics data.

Modelled on ProteomeXchange and the INSDC, the consortium brings together major repositories as a starting point, and partners with researchers, journals, funders and technology vendors.

Funded by the Chan Zuckerberg Initiative as a 30-month programme (Dec 2024 – May 2027), MetabolomicsHub launches (GC/LC-)MS-first. Later releases will extend to NMR and MS imaging, with planned support for Universal Spectrum Identifiers (USIs) — per-spectrum traceability — and the mzTab-M reporting format.

  • Promote data deposition and reuse.
  • Develop and promote open-source tools, software and open-standard data formats.
  • Continuity of data hosting — underwritten by a formal Consortium Agreement with PX-style fail-over across partners.
  • Open Data — no human-sensitive data.
  • Open Repositories — no controlled access.
  • Open Framework — enable other resources to join in the future.
Central Portal

Central Search PortalUnderpinned by a Common Data Model, in turn rooted in authoritative ontologies

The Central Search Portal delivers cross-repository discovery across 6,500+ legacy public metabolomics datasets, with all post-release MS depositions captured as fully-harmonised MHD profiles. Raw data, result files and MHD files remain hosted by the source repositories.

🔒 https://metabolomicshub.org
MetabolomicsHub search portal — facet treemaps and dataset table
Fig 1 MetabolomicsHub search portal. Faceted search across harmonised public datasets with treemap previews.
01

Unified search & faceting

Repository, organism, organism part, instrument, release year, disease, assay & measurement type — with an advanced composer for AND/OR/NOT multi-clause queries.

02

Visual highlights

Treemap previews summarise the top organisms, organism parts and assay types for any query — faster scoping for meta-analyses.

03

Open API & notifications

Public Search API and dataset-announcement feed enable LLM-ready integration, automation and external indexing — every record downloadable as an MHD Common Data Model File (JSON).

Common Data Model

Common Data Model (MHD)Graph-based model for interoperable metabolomics study metadata

The MHD Common Data Model is a versioned graph schema that represents a metabolomics study as typed nodes — Subject, Sample, Study, Protocol, Assay and other entities — linked by named relationships, with parameters bound to controlled-vocabulary values and types. Native ISA-Tab (MetaboLights); mwTab (Workbench); JSON (GNPS/MassIVE) study metadata are converted into two artefacts: an extensible MHD Common Data Model File (graph JSON) and a lighter-weight MHD Announcement File. Every controlled-vocabulary reference carries a source/accession/name triple, validated by JSON Schema and resolved via the OLS4 API.

Source
Repositories
MHD
Common
Data Model
Validation
Output
Artefacts
Native formats (e.g. ISA-Tab, mwTab) converted to the common data model.
Studies represented as a graph of typed nodes (Study, Sample, Protocol, Assay…) bound by named relationships, with parameters carrying both a value and an ontology-bound type.
Two-tier validation: JSON Schema for structure; the OLS4 API resolves every CV term against authoritative ontologies.
Each study yields two JSON artefacts: an MHD Common Data Model File and an MHD Announcement File.
MetaboLights
ISA-Tab
Metabolomics WB
mwTab
GNPS / MassIVE
JSON
MHD simplified graph schema
used-in has-protocol used-in used-in has-instance has-type
subject
sample
study
protocol
assay
parameter
definition
parameter-value
value = MS:1001911  "Q Exactive"
parameter-type
type = MSIO:0000171
"mass spectrometry instrument"
MHD Common Data Model File.mhd.json
MHD Announcement File.announcement.json
Validation
JSON Schema
structure &
type validation
OLS4 API
CV term validation
PSI-MS EDAM ChEBI EFO
Fig 2 MetabolomicsHub data flow and MHD Common Data Model. Each study yields an MHD Common Data Model File carrying the full graph-structured metadata that powers cross-repository search and meta-analyses, and an MHD Announcement File that registers the dataset with the central portal.
Automated metadata harmonisation & improvement

Chemical Identifier Enrichment

As part of the MetabolomicsHub infrastructure, an automatic pipeline for the enrichment of compound identifiers, descriptors & database cross-referencing is in development. The pipeline ingests the compound metadata provided in the source repository and fills missing values based on lookups from various online resources, with ChEBI identifiers and standardised RefMet name and identifier as key outputs.

2D structure
Trivial name
Identifiers
Molecular formula
SMILES
InChIKey
Caffeine 2D structure
caffeine
ChEBICHEBI:27732
RefMetRM0032992
PubChemCID 2519
C₈H₁₀N₄O₂
CN1C=NC2=C1C(=O)N(C(=O)N2C)C
RYYVLZVUVIJVGH-
UHFFFAOYSA-N
Fig 3 Chemical identifier enrichment — caffeine worked example. Provenance of each value is traceable and the lookup order and output value selection are highly configurable and modular to support careful optimisation prior to release.
Ontology contribution

Contributions to PSI-MS-CV

Cross-repository metadata reconciliation led to an expansion of the PSI-MS Controlled Vocabulary (PSI-MS-CV): chromatography hardware was not previously included (beyond a handful of placeholders), and non-US-vendor mass spectrometers were under-represented. Gaps were identified from cross-repository mapping, drafted with AI-assisted definition generation (iterative web search and summarisation), edited into the OBO file via Python, and submitted as GitHub pull requests.

We have contributed +220 new instrument terms (118 MS/GC-MS; 83 LC; 19 GC, Fig. 4) plus supporting hierarchies, with a further 10 terms added or under review at EDAM, ChEBI and CHEMINF. Enriched terms feed back upstream, so every downstream consumer — mzML, ProteoWizard, OpenMS, MZmine, XCMS, MassBank, etc. — benefits from a single shared vocabulary.

No. of terms added / edited
Fig 4 Contributions to PSI-MS-CV. Stacked bars show pre-existing (grey) and newly added (accent) terms across MS vendors and LC/GC system families.
Cut-off Feb 2026.
Conclusions

MetabolomicsHub harmonises public metabolomics data repositories into a unified, FAIR-compliant resource for discovery, exchange and reuse.

For Researchers
Central, faceted and filter-capable search across participating repositories. Provenance-traceable metadata, harmonised compound identifiers and machine-readable MHD JSON for downstream pipelines and AI workflows.
For Repositories
Shared converters, JSON Schema + OLS4 validation and a public announcement protocol. Less duplication, more cross-repository discoverability, rooted in authoritative ontologies.
For Funders & Journals
Persistent MHD accessions, FAIR-compliant submissions and machine-readable manifests for reproducibility checks at submission, review and audit.

A central search portal for cross-repository discovery indexes 6,500+ legacy datasets — with all post-release MS depositions captured as fully-harmonised MHD profiles — underpinned by a graph Common Data Model bound to authoritative ontologies and an automated chemical-identifier enrichment pipeline. Contributing +220 terms to PSI-MS-CV and committing to open standards (mzTab-M, USIs), the consortium strengthens the shared FAIR foundation for the metabolomics community.

Hosted & coordinated by Funded by HUPO-PSI Spring Meeting, Rome
4–8 May 2026