Health data is one of the NHS’s most powerful assets. Hidden in millions of patient records are insights that could help spot diseases earlier, design better treatments, and make healthcare more personal. But today, the journey from data to discovery is often slow. Researchers wait months for approvals, systems are tightly locked down, and few people see how their information makes a difference.​

It’s time to change that.​

A Smarter Way to Share: Progressive Data Layers​

We’re building a new approach that combines innovation with trust — a model that lets the NHS share useful, realistic data without putting anyone’s privacy at risk.​

What Is Publicly-Sourced Synthetic Data?

It’s not real patient data. Our synthetic data is completely made up and never linked to real people. The Data Matryoshka project is a DARE-UK funded initiative to create synthetic data from electronic health data held by UCLH in a safe, trustworthy and reproducible pipeline.​

How does it work?

We start with the structure of a real dataset​

Then we fill it using publicly available or approved population-level health stats​

The result: realistic-looking computer-generated data that protects privacy 100%​

Want to learn more?

An Overview of Synthetic Data at UCLH​

UCLH SQL Synthetic Generator​

HDR UK: Intro to Synthetic Data​

We are passionate about sharing our policies and best practices with other institutions. Please get in touch at uclh.safehr@nhs.net for more information.

Synthetic Data

  • A digital circuit diagram with orange lines and nodes, and red dots representing connection points, set against a black background.

    Structured - OMOP

    OMOP Extraction System

    By default, UCLH clinical data is provided for research in a format called the OMOP Common Data Model (CDM), also referred to as simply ‘OMOP’.

    OMOP-ES is our OMOP data Extraction Pipeline. It allows a diverse range of data and systems to be mapped to OMOP in a controlled, governed manner – with rules to anonymise and redact data so as to respect the privacy of patients.

    OMOP-ES easily allows new types of data to be incorporated, enforces high quality mapping and portability to other EPIC sites.

    Read more about our OMOP Policy on our Policy Page.

  • A stylized gear icon with a plus sign in the center and two red-orange gradient circles, connected by a network or circuitry design, symbolizing technology or engineering.

    Unstructured - Cogstack

    We use Cogstack to anonymise free text data from clinical notes and imaging reports. A detailed description is available on our wiki.

    Read about our Cogstack anonymisation policy on our Policy page.

  • A line-art diagram illustrating a process with a trapezoid at the bottom connected to a smaller rectangle, surrounded by three concentric arcs with dots at equal intervals, all in orange on a black background.

    Imaging - PIXL

    PIXL Image eXtraction Laboratory (PIXL) is a system for extracting, linking and de-identifying DICOM imaging data, structured EHR data and free-text data from radiology reports at UCLH.

    Learn more about our PIXL anonymisation policy on our Policy page.

Interested in data?

Find out how to request data for research here.