Managing and exploiting routinely collected NHS data for research

Vasa Curcin, Michael Soljak, Azeem Majeed


Introduction Health research using routinely collected National Health Service (NHS) data derived from electronic health records (EHRs) and health service information systems has been growing in both importance and quantity. Wide population coverage and detailed patient-level information allow this data to be applied to a variety of research questions. However, the sensitivity, complexity and scale of such data also hamper researchers from fully exploiting this potential.

Objective Here, we establish the current challenges preventing researchers from making optimal use of the data sets at their disposal, on both the legislative and practical levels, and give recommendations as to how these challenges can be overcome.

Method A number of projects has recently been launched in the UK to address poor research data management practices. Rapid Organisation of Healthcare Research Data (ROHRD) at Imperial College, London produced a useful prototype that provides local researchers with a one-stop index of available data sets together with relevant metadata.

Findings Increased transparency of data sets’ availability and their provenance leads to better utilisation and facilitates compliance with regulatory requirements.

Discussion Research data resulting from NHS data is often not utilised fully, or is handled in a haphazard manner that prevents full auditability of the research. Furthermore, lack of informatics and data management skills in research teams act as a barrier to implementing more advanced practices, such as provenance capture and detailed, regularly updated, data management strategies. Only by a concerted effort at the levels of research organisations, funding bodies and publishers, can we achieve full transparency and reproducibility of the research.

Full Text:



The Royal Society Science Policy Centre report 02/12. London: The Royal Society, 2012. (accessed 16 October 2012).

Kush RD. EHRs for clinical research. AMIA Standards Winter 2011–2012;2(2). (accessed 16 October 2012).

Groves T and Godlee F. Open science and reproducible research. BMJ 2012;344:e4383.

Hospital Episode Statistics. HES Protocol. (accessed 16 October 2012).

Health and Social Care Information Centre. Data Linkage Applications. (accessed 16 October 2012).

Weng C, Appelbaum P, Hripcsak G et al. Using EHRs to integrate research with patient care: promises and challenges. Journal of the American Medical Informatics Association 2012;19:684–7. Epub 2012 Apr 29.

Clark S and Weale A. Access to Person-Level Data in Health Care: understanding information governance. London: The Nuffield Trust, 2011.

Panton Principles. Open Data in Science. (accessed 16 October 2012).

Cabinet Office. Transparency and Open Data Team UCO. Opening up government, 2012. (accessed 16 October 2012).

Cabinet Office. Open Data Measures in the Autumn Statement 2011. (accessed 16 October 2012).

NHS Information Centre for Health & Social Care. Prescribing by GP Practice, 09/2011 ed. (accessed 16 October 2012).

Department of Health. The Government Plan for a Secure Data Service: strengthening the international competitiveness of UK life sciences research. London: Department of Health, 2011. (accessed 16 October 2012).

Collecting patient data will help UK become world leader in research, says Cameron. BMJ 2012;345 doi: 10.1136/bmj.e5285.

NHS Information Centre for Health & Social Care. General Practice Extraction Service. (accessed 16 October 2012).

Imperial College London. Rapid Organisation of Healthcare Research Data. (accessed 16 October 2012).

International Health Terminology Standards Development Organisation. SNOMED Clinical Terms. Copenhagen: International Health Terminology Standards Development Organisation, 2012. (accessed 16 October 2012).

University of Oxford. Data Flow. (accessed 16 October 2012).

University of Leicester. BRISSkit. (accessed 16 October 2012).

German National Library of Science and Technology. DataCite. (accessed 16 October 2012).

Rohde H, Qin J, Cui Y et al. Open-source genomic analysis of shiga-toxin-producing E. coli O104:H4. New England Journal of Medicine 2011;365:718–24. (accessed 16 October 2012).

Clinical Data Interchange Standards Consortium. CDISC Analysis Data Model. (accessed 16 October 2012).

TRANSFoRm consortium. TRANSFoRm: translational research and patient safety in Europe. (accessed 16 October 2012).

EHR4CR Consortium. EHR4CR: electronic health records for clinical research. (accessed 16 October 2012).

Moreau L, Freire J, Futrelle J, McGrath R, Myers J and Paulson P. The Open Provenance Model (Specification 1). University of Southampton. Report ePrint ID: 264979. 2007.

World Wide Web Consortium. W3C PROV Model Primer. (accessed 16 October 2012).

Digital Curation Centre. Overview of Funders’ Data Policies. (accessed 16 October 2012).

Simmhan YL, Plale P and Gannon G. A framework for collecting provenance in data-centric scientific workflows. Proceedings of the 6th International Conference on Web Services, 2006, Chicago, pp. 427–36. doi: 10.1109/ICWS.2006.5.

Schlauch T and Schreiber A. DataFinder – a scientific data management solution. Proceedings of PV 2007. (accessed 23 November 2012).



  • There are currently no refbacks.

This is an open access journal, which means that all content is freely available without charge to the user or their institution. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles in this journal starting from Volume 21 without asking prior permission from the publisher or the author. This is in accordance with the BOAI definition of open accessFor permission regarding papers published in previous volumes, please contact us.

Privacy statement: The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.

Online ISSN 2058-4563 - Print ISSN 2058-4555. Published by BCS, The Chartered Institute for IT