A systematic scoping review of the domains and innovations in secondary uses of digitised health-related data

Background Substantial investments are being made in health information technology (HIT) based on assumptions that these systems will save costs through increased quality, safety and efficiency of care provision. Whilst short-term benefits have often proven difficult to demonstrate, there is increasing interest in achieving benefits in the medium and long term through secondary uses of HIT-derived data. Aims We aimed to describe the range of secondary uses of HIT-derived data in the international literature and identify innovative developments of particular relevance to UK policymakers and managers. Methods We searched nine electronic databases to conduct a systematic scoping review of the international literature and augmented this by consulting a range of experts in the field. Results Reviewers independently screened 16,806 titles, resulting in 583 eligible studies for inclusion. Thematic organisation of reported secondary uses was validated during expert consultation (n = 23). A primary division was made between patient-identifiable data and datasets in which individuals were not identified. Secondary uses were then categorised under four domain headings of: i) research; ii) quality and safety of care provision; iii) financial management; and iv) healthcare professional education. We found that innovative developments were most evident in research where, in particular, dataset linkage studies offered important opportunities for exploitation. Conclusions Distinguishing patient-identifiable data from aggregated, de-identified datasets gives greater conceptual clarity in secondary uses of HIT-derived data. Secondary uses research has substantial potential for realising future benefits through generating new medical knowledge from dataset linkage studies, developing precision medicine and enabling cross-sectoral, evidence-based policymaking to benefit population-level well-being.


BACKGROUND
Healthcare still trails behind other major industries in fully exploiting information and communication technologies (ICTs) to maximise quality, safety and efficiency in service delivery. It can be argued that this is partly due to the complexity of organising and delivering healthcare services and to the challenges of introducing standardised ICT systems across healthcare settings where these are diverse and largely autonomous organisations. Yet considerable effort and capital investments are being made in the United Kingdom (UK) national health services and in healthcare organisations internationally to procure and implement ICT systems, also known as eHealth or health information technology (HIT). 1,2 Such investments have been justified by assumptions that routine use of HIT should lead to improved patient outcomes and to cost saving efficiencies in service delivery, for example by streamlining care processes. 3, 4 Recent work, however, suggests that in practice benefits from HIT can be hard to identify, at least in the short term. It has, for instance, been found that some processes can become more time consuming for some staff during the early years of using a new HIT system. 3, 4 Such disappointing evidence for the anticipated quick gains and returns on investments in HIT could potentially jeopardise continued spending on HIT initiatives. Unrealistic assumptions about the timelines for delivering benefits, for example, from core systems such as electronic health records and ePrescribing systems, place an emphasis on early measurable gains, whereas more significant advantages to healthcare and society might accrue in the medium to long term and then particularly through innovative uses of the wealth of health-related digital data that become available. 5-9 Secondary uses of data -the use and re-use of clinical and administrative data other than for the direct clinical care of specific patients -may present the greater opportunity for realising benefits from HIT investments, with such benefits emerging more slowly.
In 2007, the American Medical Informatics Association (AMIA) identified the then current areas of secondary uses of health-related digital data in USA settings. 10 This systematic scoping review, part of a larger, mixed-methods investigation into maximising the safe and secure exploitation of HIT-derived data in the UK context, aimed to build on that earlier USA work in order to provide an updated, international framework of secondary uses. Our focus was on current and potential future developments of particular relevance for UK policymakers and health service managers.

METHODS
We conducted a systematic scoping review. According to Arksey and O'Malley, 11 a scoping study is a type of literature review that can serve to 'map' a field of interest; unlike a systematic literature review, it is unlikely to address a narrowly defined research question or to assess the quality of included studies. This approach is well suited to exploring under-researched or emerging fields of study, where empirical evidence is limited. Our systematic scoping review was guided by the six-stage methodological framework developed by Levac  Terms in the second search set related to secondary uses of healthcare data, drawing on the 2007 taxonomy of secondary uses developed by AMIA. 10 (Appendix 1). Terms within groups were combined using the Boolean operator "OR" and the groups combined with the Boolean operator "AND" (Appendix 2). We applied no language or publication status restrictions.

Conclusions:
Distinguishing patient-identifiable data from aggregated, de-identified datasets gives greater conceptual clarity in secondary uses of HIT-derived data. Secondary uses research has substantial potential for realising future benefits through generating new medical knowledge from dataset linkage studies, developing precision medicine and enabling cross-sectoral, evidence-based policymaking to benefit population-level well-being.
Keywords: Medical informatics, health services research, systematic scoping review

Ethical approval
Ethical approval was not required for the systematic scoping review. The consultation phase was conducted within a related interview study that formed part of the larger, mixedmethods project from which we are reporting the scoping review component here. We obtained ethical approval for the interview study from The University of Edinburgh, and each participant gave informed consent prior to taking part.

RESULTS
Our search strategy identified 20,551 potentially relevant papers. After deduplication, 16,806 papers were included for initial screening; a further 15,089 papers were excluded because they did not meet the inclusion criteria. 1717 retained abstracts were reviewed, of which 1134 papers were defined as background papers (for example papers describing HIT infrastructure), resulting in 583 studies being included in the review. The results are presented as a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) study flow diagram, shown in Figure 1.

Source of studies
The included studies represented a spread of developed countries, with the most prominent being the USA. They also included Canada, Australia, the UK, other European countries, Scandinavian nations and The Netherlands ( Figure 2).

Secondary uses
The publications referenced a range of areas in which healthrelated digital data were being used beyond supporting individuals' clinical care. The scope of reported secondary uses included for conducting epidemiological and pharmacovigilance research studies, for facilitating recruitment to randomised controlled trials 14-16 and for carrying out audits and benchmarking studies. 17, 18 We also found secondary uses being used for financial and service planning, incident tracking, the teaching of clinical staff and billing. 19, 20 The examples of secondary uses came from within a single healthcare organisation (for example local audits), across healthcare settings (such as in service planning) and from dataset linkage studies.

Dataset linkage research
Innovative developments were most evident in the research domain, with ongoing efforts in several developed countries to establish the research infrastructure for dataset linkage studies. 21-26 For instance, researchers were able to use Scotland-wide routinely collected hospital admission data combined with death certificate data to show that legislation to ban smoking in public places in Scotland was followed by a reduction in hospital admissions for childhood asthma. 27 In Denmark, researchers also used health dataset linkage to conduct a nationwide seven-year study of everyone aged 18-36, using national registries, death certificates and primary care data to investigate the relative and absolute risks of sudden cardiac death in young Danes with a prior myocardial infarction. 28

Study selection
After initial screening by the team and removing duplicate publications, three researchers (AR, UN and KC) independently checked titles and abstracts against the inclusion/exclusion criteria. We included empirical studies that reported information about secondary uses of data held in core HIT systems in developed countries. Publications were excluded if they fell outside the scope of interest, for example, those reporting on technologies not associated with core HIT functionalities (for instance reports of speech recognition functionality). We excluded studies reporting on HIT implementations in developing countries because of the contextual differences between healthcare and its delivery in developing and developed countries. We then retrieved and reviewed the full texts of potentially eligible publications.

Charting the data
We used customised Excel forms to extract data from each of the full text papers eligible to be included in our review. The variables that were recorded by three researchers (AR, HS and UN) were author and year, title of the study, country of origin, keywords and the area of secondary uses reported, and if a study was deemed by the reviewer to offer an example of a new development in secondary uses.

Collating and summarising the extracted data
We used a thematic, qualitative content analysis approach 12,13 to organise the various areas of secondary uses identified from the review into broad domains of secondary uses, resolving any uncertainties by discussion among the researchers who had charted the data and within the wider research team.

Consultation with experts
We discussed our preliminary findings with a range of national and international experts to seek validation of our thematic organisation into domains of secondary uses and any additional insights into innovative developments in secondary uses. These individuals were selected based on their involvement in activities related to using data held in HIT systems in the UK, with additional experts beyond the UK being invited from regions with an international reputation for current work in this field. Consultees included policymakers, health professionals, academics (including researchers) and representatives of the pharmaceutical industry, the legal profession and the third sector. We approached 28 potential consultees (declined = 1; no response or subsequently could not be contacted = 4), leading to 23 interviews with participants throughout the UK and in Australia, Canada and the USA. One participant subsequently withdrew consent, reporting new workplace restrictions on giving interviews and consequently that audio file and transcript were deleted from our dataset. During the consultation stage of the scoping review, consultees were asked to highlight areas illustrative of developments in secondary uses research. 20  In addition to data linkage studies using health-related datasets, existing examples of studies using cross-sectoral data linkage included the linking of population-wide health and justice datasets in Western Australia to study hospitalisations among exprisoners during the first year after their release 29 and seeking evidence to help plan healthy neighbourhoods across the lifespan by investigating measures of the built environment linked to health outcomes and to selfreported health behaviours. 30

Domains of secondary uses
Overall, the majority of examples of secondary uses identified from scoping the literature could be categorised under the broad heading of research. Research was followed by a second large domain of quality and safety of care provision, which included audit. We grouped all of the studies in the review thematically into a total of four broad domains of current secondary uses: 1) research (n = 307); 2) quality and safety of care (n = 249); 3) financial management (n = 20); and 4) education (n = 7) (Figure 3). An initial long list of secondary uses generated from the scoping review, from which the four domains were derived, is shown in Appendix 3.

Consultation phase of scoping review
We approached 28 people (declined = 1; no response or subsequently could not be contacted = 4), leading to 23 expert participants throughout the UK and in Australia, Canada and the USA. During this stage of the scoping review, consultees were asked to highlight areas illustrative of new developments in secondary uses research. They drew attention to investigations into risk factors, treatments and disease outcomes (notably, in Scotland, diabetes-related studies), drug safety, and policy evaluation, service delivery and public health. 31-34 Consultees were asked to comment on the thematic organisation of secondary uses identified from scoping the literature into four domains. In addition to listing the four current domains, it was suggested it would be helpful first to distinguish between secondary uses involving data containing identifiers for patients -essential for providing direct, clinical care and also for some secondary uses, for instance for tracing individuals affected by contaminated surgical instruments in crisis management -and aggregated, deidentified datasets (where deidentified data were also variously known as anonymised and pseudonymised data). Keeping that distinction to the fore was considered important for policymakers and managers aiming to maximise HIT-derived benefits because of the potential for significant new research findings that were dependent on exploiting large quantities of deidentified aggregated data. Confusing data with and without patient identifiers could negatively impact on public support for secondary uses. Patients' privacy, confidentiality and consent for the use and reuse of data where those data identified individual patients were recognised as important concerns to many people.
Looking towards the future of HIT-derived data and secondary uses, consultees spoke of expanding the range of health-related datasets that were available to researchers in the UK to include general practice, imaging, genomic and biotech data, and datasets from non-health sectors such as education, housing and justice. In the UK, the potential for the Farr Institute 35 to be working in close collaboration with the Administrative Data Research Network (ADRN) 36 was highlighted as a positive step for developing cross-sectoral research work. It was believed that population well-being should benefit from cross-sectoral dataset linkage research with such studies generating an evidence base to underpin UK policy decisions and policy evaluations beyond specifically health policy. However, this could not be achieved without national workforce planning and training in order to have sufficient staff with the necessary range of technical and methodological skills to work in data linkage.
In addition to envisaged developments in -and longerterm benefits from -dataset linkage studies, progress in natural language processing software should increasingly allow researchers to take advantage of uncoded text in electronic health records. Those data and patient reported measures both would add to the digital data likely to become more widely available for secondary use research in the future.

DISCUSSION
We searched nine international electronic databases, screened 16,806 titles and found 583 eligible studies. The systematic scoping review identified secondary uses of digitised health-related data in the domains of research (the largest category), quality and safety of service provision, financial management, and education. Innovations in secondary uses were most evident in the research domain with the development of dataset linkage studies. Consultation with experts confirmed that research linking datasets -both linking health datasets with each other and linking between health and datasets from other sectors -would in their opinion continue to expand and to deliver health-related and wider societal benefits from investments in HIT systems.

3% 1%
Quality and safety of care  This is the first UK-focused systematic scoping review of secondary uses, updating the previous work in this area undertaken elsewhere. 10 The publicly funded NHS in the UK and the availability of national and regional datasets contribute to a UK-specific context for secondary uses of healthrelated digital data and likely offer particularly strong potential for innovative research that exploits dataset linkages.
While the UK context and a growing emphasis on dataset linkage studies are quite distinctive, the range of areas of secondary uses identified in our literature review is similar to the areas of secondary uses previously identified despite the passing of time since that earlier work from the USA. 10 Domains for secondary uses of health-related digital data may have reached a level of stability, at least for the foreseeable future. The more dynamic aspects appear likely to be contextual factors, for example national and international legislation controlling personal data, and further developments within a given secondary use domain, such as within the research domain. Understanding where the most potential for developing secondary uses currently lies and appreciating the importance of distinguishing clearly between data that identify patients and data that are aggregated and deidentified are a resource for UK policymakers who are developing plans and policies related to secondary uses of health-related digital data and for all those aiming to maximise returns from investing in HIT systems.

Strengths and limitations
The main strengths of our scoping review are the systematic database searching, the broad inclusion criteria and including an expert consultation stage in a thorough methodological approach to scoping our topic. The work is also timely in view of substantial funding in the UK to support the Farr Institute and the ADRN and collaborative working between the two, which is likely to enhance the potential for developing crosssectoral linkage studies. 35,36 A limitation of this literature review is that it may have missed routine secondary uses of healthcare data for management, planning, finance and audit purposes, which are taking place within healthcare settings but which would not necessarily be published in the literature.

CONCLUSIONS
Distinguishing between patient-identifiable data and deidentified datasets can help improve conceptual clarity with respect to secondary use policy and planning deliberations in the UK. Innovative secondary uses of data for research purposes hold the promise of new medical knowledge derived from health dataset linkage studies, advances in personalised precision medicine and the advent of cross-sectoral evidence-based policymaking and policy evaluations. In developed nations, domain headings for the various secondary uses of healthrelated digital data may have attained a level of stability for the foreseeable future and hence only require future updating scoping reviews at longer intervals.