MS-PINPOINT: Training Next-Generation AI Models for Multiple Sclerosis

SAFEHR infrastructure is enabling the development of large-scale AI models using multimodal UCLH data to support personalised multiple sclerosis (MS) research and care.
Through the MS-PINPOINT programme, routine brain and spinal cord MRI data from more than 7,000 patients and over 40,000 scans have been securely retrieved and anonymised from UCLH archives dating back to 2010. These imaging datasets are linked with anonymised structured and unstructured clinical data.
This SAFEHR-enabled infrastructure supports the development and training of next-generation machine learning and prognostic models designed to improve prediction of disease progression, treatment response, and long-term outcomes in MS.
Federated AI model training is now underway across the 13 participating hospitals within the international MS-PINPOINT consortium, including sites in the United Kingdom, the Netherlands, and Canada.
In collaboration with the CogStack, ARC, and SAFEHR teams, additional pipelines have also been developed to digitise and process historical scanned clinical reports into machine-readable research data. This has extended longitudinal follow-up by more than 10 years for many patients, substantially strengthening the dataset for AI model development and translational research.
For more information, please contact:
- Arman Eshaghi
  Principal Investigator
  a.eshaghi@ucl.ac.uk

Number of images exported:

33,044

Number of free text reports:

563,484

Number of patients’ structured data:

10,558

View Our Synthetic Data:

The information you see here is synthetic data. It’s not real data, does not contain real patient details, and cannot be traced back to any real patients. It is only designed to look like real health records.

We created these data to show the kind of information used in a research project called MS-Pinpoint. This study aims to develop tools to aid in the treatment of patients with Multiple Sclerosis. The project is a multi-site study, and the Principal Investigator is Arman Eshaghi, an NIHR Advanced Fellow and UCL Principal Research Fellow (Associate Professor).

Because this data is randomly generated using a tool called datafaker, some parts may not make sense — for example, a birth date might appear after a death date. That’s because the columns are made separately and don’t always link together in a realistic way.

Please note that this data is part of our preliminary version of synthetic datasets. We’re actively improving our process so that over time, more datasets will be available, and the data will look more and more like real-world data, without ever containing any real patient details.

This dataset is only for demonstration and learning purposes. Any similarity to real people is purely coincidental.

How to browse our synthetic data:

1) In the embedded table above, click the ‘view’ button next to the file you’d like to look at.

2) A new window will open up to Figshare, where the file is stored. You will see a collection of tiles containing the file folder on the top half of the page, and a project description on the bottom half of the page.

3) To view the data in your web browser, click the ‘eye’ icon on your desired file tile.

4) The tabular data will display in your browser. You can expand the screen as needed using the double headed arrow ‘full screen’ icon in the bottom right corner of the table.

5) To download the data, click the ‘download file’ icon on your desired file tile.

6) The files are in CSV format, which is like a simple version of an Excel spreadsheet.

Tip: Each row in the file is a ‘record’ (like a line in a spreadsheet), and each column is a type of information (like date, condition, or measurement).