Informatics 99jun3

Journal of Informatics in Primary Care 1999 (June):8-12


Papers


A Researcher's Experiences of MIQUEST

Andrew Meal, BMedSci (Hons), BM BS MPhil
Lecturer, School of Nursing, Faculty of Medicine and Health Sciences, University of Nottingham

Address for Correspondence:
School of Nursing, County Hospital, Greetwell Road, Lincoln, LN2 5QY


Abstract

MIQUEST (Morbidity Information QUery and Export SynTax) is an approach to the collection of computerised data from general practice databases. It is being used increasingly in local and national data collection schemes because it offers the opportunity to access anonymised data in a common format from different general practice computer systems. For individual research projects, however, MIQUEST has yet to find widespread use. This paper describes the experiences from a MIQUEST-based study in five general practices that form part of the Trent Focus Collaborative Research Network.


Introduction

It has long been recognised that general practice computer databases represent a large, mainly untapped source of data1. The data are potentially of great value to researchers working within general practice, and to researchers working for external organisations. The data will be particularly valuable for observational and descriptive epidemiological studies in primary care populations. Hitherto, data collection for such studies has involved sampling of the general population or a painstaking trawl through manually kept records in general practice, necessarily limiting sample sizes. The advent of computerised records in general practice offered a glimpse of a potentially unified data recording and access format. However, many different, mutually incompatible computer systems have evolved in general practice, and so gaining access to the data of several practices has been difficult and time-consuming even with computerised records. Furthermore, the need to maintain patient and practice confidentiality remains of the utmost importance. Any study that draws upon patient data must demonstrate that appropriate steps have been taken to protect patient and practice confidentiality, and research ethical committees require such evidence at the time of application for approval of the study.

The MIQUEST approach to the collection of data promises to address these difficulties. It should make collection of data from several practices a viable proposition for researchers, whilst at the same time protecting patient and practice confidentiality. A detailed guide to using MIQUEST is beyond the scope of this paper, but the Clinical Information Consultancy website (http://www.clinical-info.co.uk) is an excellent source of that information. Pringle et al 2 have published an evaluation of MIQUEST Version 3 in a single practice, and their main findings are contextualised in a recent paper3 in which they point out that MIQUEST is starting to find widespread use as a data collection tool in national and local data collection schemes such as the Collection of Health Data from General Practice (CHDGP) project (http://www.nottingham.ac.uk/chdgp) and the Trent Focus Collaborative Research Network. MIQUEST has yet to find widespread use as a tool for individual researchers, however: only one published paper4 could be found in a Medline database search.

The aim of this paper is to present an account of a small-scale study that used MIQUEST Version 4 as the principal data collection method. The paper will focus on methodological issues specific to the MIQUEST approach rather than on the general methodology or data analysis and interpretation. Version 4.1 of the MIQUEST Specification5 identifies sixteen steps to the extraction of data from general practice databases using the MIQUEST approach, and the table below is adapted from that document (page 7).

Step What happens? Who is involved?

1

Set up Data Collection Agreements Researcher and Practices

2

Record patient data Practices

3

Formulate questions Researcher

4

Write queries Researcher

5

Address queries Researcher

6

Send queries Researcher

7

Receive queries Practices

8

Authorise queries Practices

9

Interpret queries Practices

10

Generate responses Practices

11

Authorise responses Practices

12

Send responses Practices

13

Receive responses Researcher
14 Aggregate responses Researcher
15 Analysis of responses Researcher

16

Provide feedback to Practices Researcher and Practices

This paper will be presented according to the steps identified above.


Outline of the Study

Title

The Natural History of Ischaemic Heart Disease in General Practice.

Aims

  1. To estimate the number of cases of IHD in a general practice population.
  2. To determine the recorded diagnosis at first presentation of IHD.
  3. To determine the rate of development of further complications of IHD.
  4. To assess mortality following diagnosis of IHD.

Design

A retrospective survey of computer databases in five general practices. These general practices are part of the Trent Focus Collaborative Research Network. One of the roles of this network is to enable contact to be made between researchers and practices within the network so that the data held by the practices can be made available for research. Research proposals are subject to review by a Trent Focus Committee before being sent to practices for their consideration.


Setting up the Data Collection Agreements (Step 1)

The protocol for this study was submitted to a local Research Ethical Committee (LREC) and to the Trent Focus Collaborative Research Network. It was stressed on the LREC application forms that MIQUEST would not allow the researcher to extract data that could uniquely identify a patient. At the time of the study, MIQUEST interpreter software was only available on EMIS and Meditel System 5 computer systems, so following approval of the protocol, a summary (Practice Briefing) was sent to all EMIS and Meditel System 5 practices within the Trent Focus. The practices were asked to indicate whether they wished to take part in the study. Five practices were included in the study. The Data Collection Agreements (and associated Query Agreement Codes, see Step 4) were set up between the Trent Focus and the practices, rather than between the researcher and the practices. Thus the process of recruiting practices and setting up agreements was much easier and quicker than if the researcher had been working independent of the Trent Focus.


Recording of Patient Data Required for the Study (Step 2)

This study was a retrospective survey of practice databases, so the quality of the data was beyond the control of the researcher. An assessment of the validity of the databases was therefore required. Comparing MIQUEST-extracted data with similar data derived from other sources can assess the validity of the databases. Methods for assessing database validity have been discussed elsewhere4,6,7,8. This study adopted a two-stage approach to validation. First, all patients with angina pectoris recorded on the practice computer were identified: this took place in the practice, and no patient-identifiable data were recorded or reported outside the practice. A ten percent sample of these patients was then randomly selected, and the manual records examined for a diagnosis of angina. No computer records were found that did not have a corresponding manual record of angina, therefore it was concluded that the computerised records were sufficiently internally consistent to be included in the study. External validity was assessed by comparing the percentage of patients with angina in each practice with similar figures quoted by the national Collection of Health Data in General Practice (CHDGP) project.


Formulate the Study Questions (Step 3)

The formulation of appropriate questions is an essential prerequisite to any study, and for a MIQUEST-based study it is particularly important. Central to the MIQUEST approach is a structured Health Query Language (HQL), and all the study questions need to be expressed as HQL queries before data can be extracted from the practice databases. A study question that is ambiguous may, when translated to HQL, give rise to queries that yield inappropriate data. For example, the question "How common is ischaemic heart disease in this practice?" is too vague. A more specific version of this question is "On <today’s date>, how many people within the current practice population had at least one recorded diagnosis within the diagnostic code range for ischaemic heart disease?" Another interpretation of the same question that would produce different results might be "Within the last year, how many new recorded diagnoses of ischaemic heart disease have been made?" In this study a "who, what and when" strategy was adopted when posing the questions. In other words, what group of patients do I need data from, exactly what do I want to know about them, and over what time period do I want to collect the data? By framing the questions in this way, it becomes easier to select the correct parameters for the queries within the HQL Editor.

In this study there were two questions:

  1. How many patients in today’s practice population have a recorded diagnosis of angina pectoris? This question was part of the process of validating the databases.
  2. For all patients who have a first recorded diagnosis of ischaemic heart disease within the last five years, what subsequent recorded episodes of IHD were there in the same five-year period, and on what dates did they occur? What was the date and cause of death of any of these patients during the same five-year period?

Writing the Queries (Step 4)

The text of the queries used in this study is shown in the Appendix.

Queries are written in HQL using a Windows-based HQL Editor. The HQL Editor is a menu-driven program that takes the author through the structure of a query step-by-step, and translates the selected parameters into HQL. The process is made considerably easier if the study questions are clear and specific. In addition to the questions, the author of the queries needs to know the clinical coding scheme (*QRY_CODES) used by the practice that will receive the query (for example, 4- or 5-character Read Codes), and the query agreement (*QRY_AGREE) set up in Step 1.

The questions shown in Step 3 yielded three queries. Question 1 was expressed as a single ANALYSE style query (Query1), drawing on the current practice population and counting the number of patients with at least one recorded diagnosis of angina pectoris (G33%).

Question 2 was expressed by first creating a SUBSET (Query 2) of all patients that had ever been registered with the practice, including those that had died or left. The rest of the question was expressed as a REPORT style query (Query 3) based on the subset. All recordings of ischaemic heart disease (G3%) were reported as the diagnostic code and the date of occurrence, along with the reference number, sex and date of birth of the patient concerned. In order to exclude those patients whose initial recorded episode of ischaemic heart disease had occurred before the five-year period, only records WITHOUT a recorded diagnosis before the five-year period were selected. Finally, the date and cause of death (94B%) for any of the patients was requested. The three queries were saved as a set (*QRY_SETID).


Addressing the Queries to the Practices (Step 5)

In this step, the generic query set created above was prepared for sending to the five practices within the study. Here, the Query Manager software inserts practice and researcher identifier information (*ENQ_RSPID, *ENQ_IDENT) into copies of the generic query set such that a specific query set is created for each practice. In a small study of five practices where there is only one data collection agreement per practice it is quite straightforward to keep track of the practice identifier information, but in a much larger study this may not be possible. The Query Manager allows for a database of practice identifier information to be created, along with data collection agreements set up between the researcher and the practices, so query addressing for any study becomes a much more streamlined process.


Sending the queries to the practices (Step 6)

The Query Manager allows for each query set to be downloaded to floppy disk (*QRY_MEDIA,D,Disk). A blank disk must be used for each practice. For this study the disks were posted to the practices, but software is currently being developed by EMIS and Meditel to allow query sets to be sent over network links.


The Queries at the Practices (Steps 7–12)

Members of staff at each of the five practices had received instruction from the Trent Focus on how to load MIQUEST queries onto their practice system and how to deal with the access and return of data. From the researcher’s perspective these steps are entirely under the control of the practice. Were it not for the database validation (Step 2) outlined above, the researcher would not have needed to visit any of the five practices. In the evaluation of Pringle et al 2 it was noted that Steps 7 – 12 are important because they allow the practices final control over which queries to admit and what data to release. In the present study no problems were reported by the practices in terms of the nature of the queries, errors in the queries, or the data that were requested.


Receiving the Responses (Step 13)

The disks containing the queries that had been sent to the practices were returned by post. Files containing data can be recognised by the extension ".csv". These CSV files are in plain text format, with comma separated variables. In Step 4 it was stated that the date of birth was requested for each patient. This could uniquely identify a patient and breach confidentiality, but the full date of birth was automatically truncated by the MIQUEST software so that the only information returned to the researcher was the year of the patient’s birth. In this study, however, some means of being able to reference a particular patient’s data was required to allow episodes of ischaemic heart disease to be plotted against time for individual patients. MIQUEST generates a code for each patient to allow for analysis such as this, but the code is meaningless to anyone outside the practice – researchers cannot tell who the patient is simply by knowing this code number.

No cause of death information was returned from any of the five practices. The reason for this is that none of the five practices records cause of death information under the Read Code 94B%. Any future studies that require cause of death information will need to account for this in terms of the extra time and effort required to extract such data from either manual records, or directly from the computer of each practice.


Aggregating the data files (Step 14)

For the present study it was necessary to aggregate the data from each of the five practices. The Response Manager is able to take the CSV files, strip the header text from the data, and combine the separate files into a single file. In this study it was not possible to use the Response Manager because the dates on which the query files (.hql) were processed by the Query Manager (Step 5) were not the same for each practice. This problem was not anticipated, and was the researcher’s fault, but was easy to overcome. Each CSV file was imported directly into Microsoft ExcelÔ and the header text removed manually to create a single Microsoft ExcelÔ worksheet from the five CSV files. For a larger study with, perhaps, twenty CSV files this method could have proved inconvenient, therefore it is important to ensure that the addressing of the queries with the Query Manager takes place on the same day for all practices in the study.


Analysis of the Data (Step 15)

As described above, data are easily imported into Microsoft ExcelÔ . The researcher also successfully imported data into SPSS for Windows and into Minitab for Windows. This study required that the single aggregated data file be re-ordered and re-formatted for various statistical tests. This re-ordering was carried out in ExcelÔ and the data then "copied and pasted" into Minitab or SPSS. The Response Manager is designed to allow re-formatting of amalgamated data files, but in this study the Response Manager could not be used because of the reasons outlined in Step 14.


Conclusions

In this small pilot study, MIQUEST was found to be an effective tool for extracting data from EMIS and Meditel System 5 general practice computer systems*. It should be possible to use the methods described in this paper in a much larger study. At the moment it is still necessary to validate practice databases because their completeness and accuracy cannot be assumed, therefore the size of future studies will be limited by the number of practices it is possible to visit for the purposes of database validation. Initiatives such as the CHDGP and Trent Focus Collaborative Research Network are encouraging practices to record high quality computerised data, so it may soon be possible to assume minimum standards of data in practices that belong to such schemes. In this study the recruitment of the practices and the subsequent extraction of the data was made considerably easier by submitting the study to the Trent Focus. Whilst individual researchers are not precluded from using MIQUEST, it is the author’s opinion that networks such as the Trent Focus that use MIQUEST as their data extraction tool represent the best way for researchers to collaborate with practices on research projects such as that described in this paper.


Appendix – text of the queries

Query 1

*QRY_WDATE,19980112,12/01/1998
*QRY_SDATE,19980112,12/01/1998
*QRY_ORDER,001
*QRY_TITLE,AGMIHD0,Validation - angina
*ENQ_RSPID,
*QRY_MEDIA,D,Disk
*QRY_AGREE,
*QRY_SETID,AGMRead5,Read 5 query set
*ENQ_IDENT,AGM,Dr A G Meal
*QRY_CODES,0,9999R2,Read version 2
DEFINE AGE AS @YEARS("13/01/1998",DATE_OF_BIRTH)
ANALYSE
GROUPED_BY SEX ("M";"F")
AND AGE ("0"-"4";"5"-"9";"10"-"14";"15"-"19";"20"-"24";"25"-"29";"30"-"34"\
;"35"-"39";"40"-"44";"45"-"49";"50"-"54";"55"-"59";"60"-"64";"65"-"69";"70"\
-"74";"75"-"79";"80"-"84";"85"-"89";"90"-"94";"95"-"99";"100"-"104";"105"\
-"109")
FROM JOURNALS (ONE FOR PATIENT)
WHERE CODE IN ("G33%")

Query 2

*QRY_WDATE,19980112,12/01/1998
*QRY_SDATE,19980112,12/01/1998
*QRY_ORDER,002
*QRY_TITLE,AGMIHD1,Subset of all patients ever
*ENQ_RSPID,
*QRY_MEDIA,D,Disk
*QRY_AGREE,
*QRY_SETID,AGMRead5,Read 5 query set
*ENQ_IDENT,AGM,Dr A G Meal
SUBSET ANDYSUB KEEP
FROM PATIENTS
WHERE ACTIVE IN ("R","D","L")

Query 3

*QRY_WDATE,19980112,12/01/1998
*QRY_SDATE,19980112,12/01/1998
*QRY_ORDER,003
*QRY_TITLE,AGMIHD2,Patients with IHD diagnosis code
*ENQ_RSPID,
*QRY_MEDIA,D,Disk
*QRY_AGREE,
*QRY_SETID,AGMRead5,Read 5 query set
*ENQ_IDENT,AGM,Dr A G Meal
*QRY_CODES,0,9999R2,Read version 2
FOR ANDYSUB
REPORT
PRINT REFERENCE,SEX,DATE_OF_BIRTH,CODE,DATE
FROM JOURNALS (ALL FOR PATIENT)
WHERE CODE IN ("G3%")
AND DATE IN ("01/01/1992"-"31/12/1997")
WITHOUT JOURNALS (FOR_SAME_PATIENT_AS THE ABOVE)
WHERE CODE IN ("G3%")
AND DATE IN ("01/01/1850"-"31/12/1992")
PRINT CODE,DATE
FROM JOURNALS (ONE FOR PATIENT)
WHERE CODE IN ("94B%")
AND DATE IN ("01/01/1998"-"31/12/1998")


References

1 Pringle M, Hobbs R. Large computer databases in general practice. Br Med J 1991; 302:741–742
2 Pringle M, Meal A, Hammersley V, Wright L. An Evaluation of MIQUEST in a Nottinghamshire Practice. Nottingham Primary Care Research Unit Monograph Series No. 4. Department of General Practice, Medical School, Queen’s Medical Centre, Nottingham. 1997. ISBN 1 901131 02 5
3 Hammersley V, Meal A, Wright L, Pringle M. Using MIQUEST in General Practice. Journal of Informatics in Primary Care 1998 (Nov):3–7
4 Neal RD, Heywood PL and Morley S. Real world data – retrieval and validation of consultation data from four general practices. Family Practice 1996; 13(5):455–461
5 Markwell D. MIQUEST specification Version 4.1. Clinical Information Consultancy. February 1997.
6 Pringle M, Ward P, Chilvers C. Assessment of the completeness and accuracy of computer medical records in four practices committed to recording data on computer. Br J Gen Pract 1995; 45:537–541
7 Jick H, Jick SS, Derby LE. Validation of information recorded on general practitioner based computerised data resource in the United Kingdom. Br Med J 1991; 302;766–768
8 Whitelaw FG, Nevin SL, Milne RM, Taylor MW, Watt AH. Completeness and accuracy of morbidity and repeat prescribing records held on general practice computers in Scotland. Br J Gen Pract 1996; 46:181–186

Refbacks

  • There are currently no refbacks.


This is an open access journal, which means that all content is freely available without charge to the user or their institution. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles in this journal starting from Volume 21 without asking prior permission from the publisher or the author. This is in accordance with the BOAI definition of open accessFor permission regarding papers published in previous volumes, please contact us.

Privacy statement: The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.

Online ISSN 2058-4563 - Print ISSN 2058-4555. Published by BCS, The Chartered Institute for IT