Reimagining How We Use NHS Data - Safely, Responsibly, and at Speed

17 Apr

By Sarah Keating 17th April 2026

Turning complex health data into innovation fuel while keeping privacy at the heart of everything

Health data is one of the NHS’s most powerful assets. Hidden in millions of patient records are insights that could help spot diseases earlier, design better treatments, and make healthcare more personal. But today, the journey from data to discovery is often slow. Researchers wait months for approvals, systems are tightly locked down, and few people see how their information makes a difference.

It’s time to change that.

A Smarter Way to Share: Progressive Data Layers

We’re building a new approach that combines innovation with trust — a model that lets the NHS share useful, realistic data without putting anyone’s privacy at risk.

Think of it as progressive data layers, like a set of Russian dolls. Each layer offers more detail depending on what’s needed, with total transparency about how it is generated.

🟢 Public Synthetic Data: Derived only from publicly available statistics - great for testing tools and understanding NHS data structures.
🟡 Private Synthetic Data: Artificially generated from patterns learned from real data, but containing no identifiable patient-level records - a communication artifact for sensitive-data research
🔵 Real Data: Genuine patient information, carefully accessed within secure environments for final‑stage discoveries.

This layered model uses the OMOP standard that’s already becoming the defacto international standard for healthcare records — making it compatible, consistent, and ready to scale nationally.

What Does Public Synthetic Data Actually Look Like?

Let's peek under the hood. Below are two sample tables of public synthetic data — the kind anyone can access to explore NHS data structures and test their tools.

  
    
        Person ID
        DOB
        Sex
        Ethnicity
      

    
        1
        2000-01-01
        Male
        British
      

        2
        1966-01-01
        Female
        British
      

        3
        1945-01-01
        Male
        British
      

        4
        1991-01-01
        Female
        British
      

        5
        1973-01-01
        Male
        Asian
      

  

The person table

  
        Person_ID
        is_pregnant
        blood_pressure
      
        1
        N
        98/132
      
        2
        Y
        120/80
      
        3
        Y
        170/52
        
        4
        N
        136/92
       
        5
        N
        98/132

Random Table

You'll notice a few quirks — and that's by design.

🎂 Same birthdays? Everyone shares a birth date to minimise any chance of identification.

🤰 A pregnant 80‑year‑old man? Public synthetic data is generated from average trends — unless explicitly built in it doesn't cross‑reference other details about the same person. So yes, impossible combinations happen.

💓 Odd blood pressure readings? Values are pulled from a statistical distribution without enforcing real‑world rules (like systolic pressure always being higher than diastolic).

And in a real dataset? A uniquely identifiable patient like person #5 would be excluded entirely.

So, what's the Point?

Even with these quirks, public synthetic data is incredibly useful. It lets users get familiar with data structures communicate research intent and test logic without exposing real patients, and explore how NHS databases are organised — all without touching anything sensitive.

It's the training wheels for health data research — safe, accessible, and a crucial first step toward deeper insights.

What Does Private Synthetic Data Look Like?

Here's the thing — we can't actually show you.

Even though private synthetic data contains zero real patients, it's generated by learning patterns directly from genuine NHS records. That means it behaves much more realistically than public synthetic data. The pregnant 80‑year‑old men disappear. Blood pressure values make clinical sense. Patterns between conditions, demographics, and outcomes start to reflect what we'd see in the real world.

And that's exactly why it needs some safeguards.

🔍 Realistic patterns reveal real insights. Because the data mirrors genuine health trends, it could potentially expose information about specific population groups or rare conditions — even without containing actual patient records.

🏥 Trust is everything. Keeping private synthetic data within Trusted Research Environments (TREs) ensures it's only accessed by approved researchers, with clear audit trails and governance oversight.

⚖️ A smarter middle ground. It's not as locked down as real patient data, but it's not open to the world either. This balance accelerates early work, allowing more flexible research — while maintaining the safeguards the public expects.

Think of private synthetic data as the dress rehearsal before the real performance. Researchers can develop methods, test hypotheses, and refine their approaches using data that behaves like the real thing — all before they ever need access to actual patient records.

It's powerful, realistic, and still private by design.

🪆 Remember the Russian dolls?

This is the middle layer in action. Public synthetic data (the outer shell) let us explore and learn openly. Private synthetic data (the next layer in) gets us closer to reality — but within a trusted, controlled space. And at the core? Real patient data accessed only when absolutely necessary.

Each layer reveals more detail. Each layer has the right safeguards. That's the power of progressive data sharing.

Built on Collaboration and Trust

UCL ARC is collaborating with the Alan Turing Institute and University College London Hospital. Together we’re:

Expanding our proven tool, SQLSynthGen/data faker, to transform complex NHS databases into high‑quality synthetic data

Creating open, transparent governance frameworks that help information governance teams move faster — safely

Building training programmes for analysts and governance leaders

Embedding patient and public voices in every decision

Because innovation only works when people understand, support, and trust the process.

Why This Matters

🚀 Speeds up research: Scientists can test ideas and methods faster, before requesting real data

🔒 Protects privacy: No identifiable patient-level data ever leaves secure systems

👩‍💻 Empowers learning: Students and NHS teams can use realistic health data without risk

🌍 Drives innovation: Lowering data access barriers helps digital health startups and researchers collaborate globally

💬 Builds trust: People see how their information makes a difference — safely and openly

🤝 Sparks better conversations: Synthetic data becomes a shared reference point during collaborative discussions — helping clinicians, researchers, and data teams speak the same language

🛠️ Strengthens engineering: Developers can test code logic against realistic data structures without waiting for access to real records — catching bugs faster and building more reliable tools

The Bigger Picture

This project is about more than data. It’s about re‑imagining what’s possible when innovation meets integrity.

By designing smarter, safer ways to use information, we’re helping turn NHS data into a force for good — driving discoveries faster, empowering transparency, and improving care for everyone.

Claire Black