Anonymisation Policy
This overarching policy aims to advise staff of the purpose and requirements of using anonymised data instead of personal identifiable data for purposes that are outside of direct care. The primary aim of anonymisation is to enable the NHS and private sector organisations to use patient data for secondary (non-direct care) purposes in a legal, safe and secure manner to promote the ICO’s principle of storage limitation.
It is the duty of all staff to understand the different uses of information, to use anonymised data whenever possible without impacting project utility, and to seek guidance if they are unsure about what type of data they should be using.
Definitions
The definitions adhered to at UCLH are as follows, taken from the UK GDPR and used by the ICO:
Personal data
means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
Pseudonymisation
is the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.
Anonymisation
Is information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.’
Data is “effectively anonymised” when the recipient is unable to infer the identity of individuals from the data without the application of any significant effort that is reasonably likely to be used. If there are reasonably available means that could be used to re-identify individuals, then the data in question is not effectively anonymised. The key is what is reasonably likely relative to the circumstances, not what is purely hypothetic, theoretical, or conceivably likely in absolute.
Anonymisation methods that may be applied at UCLH
The classification of information as ‘anonymised’ is incredibly context specific, so whilst we apply a general rule of thumb that allows us to provide data for research in a consistent way, sometimes advice may vary. Opinions on anonymisation will differ between organisations and even within teams of professionals. Anonymisation methods that may be applied are as follows:
Removal of direct identifiers - The following data items are considered to be direct identifiers and should be subject to removal/anonymisation processes. They can relate to a staff member or patient:
Name
Address
Date of birth
Postcode
NHS Number
Local patient identifier (MRN)
Telephone number
Most dates of appointments/treatments/hospital visits
Initials
This policy allows for the inclusion of age, gender, ethnicity, deprivation index, and study participiant ID as standard in all anonymised datasets.
Jittering: Adding a small amount of random noise to the position of each point.
Date/time shifting: Dates and times can be shifted to one particular point in time and all events can be shifted to a point relative to that.
Removal of personal data items: Data items are simply removed and not replaced, after a consideration of project utility without the items included.
Using derivations: We can map electoral wards to postcodes and display that value instead of the actual postcode. We can also display the first three letters of a postcode, or the LSOA instead. We can display age instead of date of birth or map to a banding of values, such as displaying age bands (e.g. 5-10) instead of date or year of birth.
Consideration of sample size: The smaller the cohort of patients, the more identifiable those patients are. Generally any cohort which contains fewer than 150 patients will not be classified as anonymous.
SAFEHR defaults
We apply time shifting
We do not release High risk items
Inclusion of specific data items
From time to time, the Research Data Access Committee may be asked to include specific data items as part of a dataset. Data items that have been approved as part of an anonymous data set can be found below:
Height and weight
BMI
Gestational age
Hospital site
Questionnaire responses, specifically from the Migrant Health Screening Form
Publication of results derived from anonymised datasets
When publishing results derived from anonymised datasets, the Chief Investigator of a project must consider:
Table redesign
Grouping or collapsing categories within a table of data, aggregating to a higher level geography or for a larger population sub-group, and or aggregating tables across a number of years, quarters and or months
Cell suppression
Replacing patient counts of 5 and under with an asterisk
Rounding of values
Use of direct quotes
It is generally not permissible to use direct quotes of patients or staff without their consent
Some organisations seek to remove records that are unique because of their very circumstance, and the more unique a record is the more identifiable it is. We’ve chosen not to implement this at UCLH because we have confidence in our researchers, and in our technical and organisational methods to protect patient privacy.
Implementation of technical and organisational measures
Anonymisation is a difficult concept that sets a high standard in order to achieve it. Whilst we have faith in our policy and procedures, we recognise that it may not be perfect. We have implemented some technical and organisational measures to ensure patient privacy and researcher integrity is secured:
Technical measures:
All anonymisation is undertaken by authorised UCLH staff, who work closely with the Information Governance Team. The team build experience in what we expect of an anonymised dataset, and we build confidence in the process that has been implemented.
All datasets must follow the same process, whether they are being curated by the clinical research data team or the clinical team already have the dataset they want to use (this is commonplace where a service evaluation turns into research)
All physical IG and information security standards are implemented as recommended by the DSPT – e.g. role based physical access to office spaces
Organisational measures:
Each CI must sign a Code of Conduct and agree to protect the data they’re using, comply with all IG policies and take responsibility for other researchers working on their behalf
Data can only be stored in approved, secure storage spaces so that if the data does contain an identifiable record, the chance of it getting into the public domain is reduced.
Assessing successful anonymisation
When assessing whether a dataset is anonymous, we need to take account of the “means reasonably likely to be used”. However, we do not need to take into account any purely hypothetical or theoretical chance of identifiability. The key is what is reasonably likely relative to the circumstances, not what is conceivably likely in absolute.
We can apply a motivated intruder test to ascertain whether an intruder would be able to achieve identification if they were motivated to attempt it. It is used by both the ICO and the Information Tribunal, which hears DPA 2018 and FOIA appeals.
We are passionate about sharing our policies and best practices with other institutions. Please get in touch at uclh.safehr@nhs.net for more information.