Big Data, Health Data

How to unlock the potential of health data within a compliant information governance framework.

The NHS holds what are arguably the largest and most detailed digital, linkable (and thus potentially interrogatable) databases of healthcare data in the world. The potential for using this NHS data for medical research and the development of better and more efficient treatments has only just started to be tapped. The development of AI and machine learning techniques to analyse massive datasets means technology is becoming equal to the task of exploring the opportunities provided by such rich data sources. Multiple sources of funding are being made available to promote such innovation.

However, as recent headlines demonstrate, it is vital to ensure that the technical possibilities arising from such data do not dazzle researchers and developers so that they ignore the legal and ethical rules that govern the use of such data, including those set out in the Caldicott Principles, the Confidentiality: NHS Code of Practice and the GMC guidance “Confidentiality: good practice in handling patient information”.

Fundamental principle

Patient-identifiable data (also known as patient confidential data or personal confidential data) should only be used for purposes connected with the direct delivery of care to the patient, and only shared with those who need to know within the direct care team. Use of patient-identifiable data outside this normally requires either patient consent or one of the recognised gateways where patient confidentiality can be set aside. This current safeguard for patient data anticipated a number of the requirements under the General Data Protection Regulation, including the principle of data minimisation and the requirement for data protection by design and default.

The problem

“Big Data” projects will generally require individual-level data in order to be useful. While the identity of the patient can be stripped out from individual-level datasets, they are usually so data rich it is impossible to truly anonymise them (under the GDPR, anonymisation means that reidentification must be “reasonably impossible” by anyone). In addition, “Big Data” projects will involve medical research or similar activities, and so will fall outside the direct delivery of care to patients (even if the project might ultimately result in a product that will influence patient care). Obtaining express consent from patients will not be feasible for any “Big Data” project, due to the scale of the dataset.

It might therefore be thought that the rules protecting health data are incompatible with any ambitions to use these datasets outside direct care.

The solution

With appropriate safeguards, it is perfectly possible to undertake these projects using health data. The GDPR criteria for anonymisation are so stringent it is likely that most individual-level data must still be regarded as personal data even when the direct identifiers are removed. However, the GDPR is reasonably permissive regarding the use of health information for medical research and the management and improvement of health care systems or services provided appropriate safeguards are in place. Similarly, the common law rules relating to the use of health care data do not require that the data is anonymised to GDPR standards; instead what is required is that the patient-derived information is anonymised in the hands of those carrying out the non-direct care activities.

There are a number of data extraction tools that enable the automated extraction of patient-level data without the patient identifiers, enabling the creation of a de-identified patient-level database while still enabling linkage of individual cases through the use of pseudonyms. These “de-identified” databases can then be interrogated for “Big Data” projects, including AI-driven machine learning tasks, which have the potential to spot associations and interactions that would simply be impossible using human researchers.

A de-identified but still patient-level database is vulnerable to accidental or intentional re-identification if combined with other data sources. Accordingly, it is essential that an appropriate controlled environment is created to house this de-identified dataset and protect it against the potential risk of re-identification. This controlled environment will involve a combination of organisational, physical and digital barriers to ensure that those using the de-identified dataset cannot access any other dataset and cannot use that de-identified dataset for purposes beyond those authorised by the data controller supplying the source data.

The importance of these controls is reflected in the UK Government’s acceptance of the Caldicott 3 recommendation that re-identification of de-identified data without the permission of the data controller be a criminal offence, now enacted in section 171 of the Data Protection Act 2018.

As de-identified data is still personal data within the meaning of the GDPR, data controllers must still comply with their controller obligations in relation to the “Big Data” project. In particular, this means there should be transparency about the project, including its purpose, who the data is shared with, what activities will be undertaken and what protection for the data is in place. The fact of de-identification and the safeguards against re-identification will assist the data controller in showing that such uses of the information are fair and do not conflict with patients’ interests, rights and freedoms.

Provided appropriate thought and planning is given to information governance concerns at the stage of project design, and the task to be undertaken is one that would objectively be regarded as a “fair” use of data, it should be possible to establish a framework that will enable a “Big Data” project to proceed lawfully. In some cases, Research Ethics Committee approval will be required, dependent on the anticipated outcomes.

The opportunities

The scope of “Big Data” projects in healthcare is only just beginning to be explored. The potential for establishing a better understanding of conditions, treatments and medication means there is a real opportunity to achieve the holy grail of delivering better, more effective care at a lower cost than existing treatments. There is a strong moral case for pursuing these projects, which is reflected in the increasing funding streams being made available to support innovation in this field.

Hempsons have advised on a number of “Big Data” projects, including a major Phase 1 Test Bed project.

Continue reading other newsbrief articles:

Supreme Court judgment on withdrawing clinically assisted nutrition and hydrationFit and proper persons requirement, Summary of the Dr Bawa-Garba caseCan a disability account for bad behaviour?Inadequate record keeping invalidates contract award