Call for consistent coding in diabetes mellitus using the Royal College of General Practitioners and NHS pragmatic classification of diabetes

Background The prevalence of diabetes is increasing with growing levels of obesity and an aging population. New practical guidelines for diabetes provide an applicable classification. Inconsistent coding of diabetes hampers the use of computerised disease registers for quality improvement, and limits the monitoring of disease trends. Objective To develop a consensus set of codes that should be used when recording diabetes diagnostic data. Methods The consensus approach was hierarchical, with a preference for diagnostic/disorder codes, to define each type of diabetes and non-diabetic hyperglycaemia, which were listed as being completely, partially or not readily mapped to available codes. The practical classification divides diabetes into type 1 (T1DM), type 2 (T2DM), genetic, other, unclassified and non-diabetic fasting hyperglycaemia. We mapped the classification to Read version 2, Clinical Terms version 3 and SNOMED CT. Results T1DM and T2DM were completely mapped to appropriate codes. However, in other areas only partial mapping is possible. Genetics is a fastmoving field and there were considerable gaps in the available labels for genetic conditions; what the classification calls ‘other’ the coding system labels ‘secondary’ diabetes. The biggest gap was the lack of a code for diabetes where the type of diabetes was uncertain. Notwithstanding these limitations we were able to develop a consensus list. Conclusions It is a challenge to develop codes that readily map to contemporary clinical concepts. However, clinicians should adopt the standard recommended codes; and audit the quality of their existing records.


Introduction
The estimated number of adults in England with diabetes mellitus was 3.1 million in 2010 and is predicted to rise to 4.6 million by 2030. 1 The current cost of treating diabetic complications is £7.7 billion, which represents almost 80% of NHS diabetes spending, and this has been predicted to increase to £13.5 billion by 2035/6. 2 Although estimates of the prevalence of diabetes and its complications vary, there is a high burden of morbidity among the large and growing diabetic population from diabetic complications which are potentially preventable by consistent application of evidence-based guidelines.
England is in advance of many countries in developing systems to support physicians to improve the quality of care they provide by applying evidencebased guidelines. This is known as regulatory compliance and has been underpinned by better use of information systems. 3 The components of regulatory compliance, which also include the use of incentives and inspection, can be represented formulaically ( Figure  1). Clinical standards have been made explicit through a series of National Service Frameworks and a Cancer plan. 4 A wider range of guidance is provided by the National Institute for Health and Clinical Excellence (NICE). 5 Furthermore, pay for performance (P4P) has been introduced into primary care to encourage better conformance with and to accelerate the adoption of evidence-based practice. 6 The current UK regulator ensuring compliance is the Care Quality Commission (CQC). 7 The measurement of performance is underpinned by information systems. 8 Within diabetes, several components for achieving regulatory compliance are in place: . The National Service Framework identifies up-todate explicit clinical standards. 9 . Incentives for practices to run primary-care-based diabetes clinics have been in place for several years; larger practices may have been able to respond better to them. 10 P4P initially did not differentiate types of diabetes. However, from 2006 separate P4P indicators were created for patients with type 1 diabetes (T1DM) and type 2 (T2DM). The impact of these incentives remains unclear. 11 . Regulation and inspection have not really affected the delivery of diabetes care. The National Diabetes Audit (NDA), although the largest and most comprehensive diabetes audit in the world, provides only a light touch overview of the quality of care. Although this audit work is due to be extended into gestational diabetes and foot care in the coming years. 12 . UK general practice is comprehensively computerised. Electronic clinical management systems, in which the clinician selects codes to record consultations directly into computerised medical record (CMR) systems are widely used for case management. 13,14 Access to data for audit purposes will improve with the implementation of a new comprehensive general practice (data) extraction service (GPES) with the goal of conducting more national quality improvement initiatives. 15 All practices are required to compile a patient register of patients with diabetes, and electronic case management with clinician-led coding allows this to be automated. These patient registers are primarily used to support the application of guidelines, for example, in prompting appropriate routine screening for diabetic complications. They can also be used for clinical audit and quality improvement. Electronic data from primary care databases are also used to compile routine data for monitoring of disease trends, and for epidemiological studies. 16 These functions all rely on clear and consistent classification and coding of diabetes at the point of care. In order to provide clarity the Royal College of General Practitioners (RCGP) and NHS Diabetes set up a Classification of Diabetes working group, which included a subgroup looking at the informatics issues around the implementation of a new classification. The working group created a pragmatic classification for diabetes (Figure 2), yet made no specific recommendations for coding. This paper sets out to fill that gap by recommending codes for the coding systems most commonly used in UK primary care.

Method Terminologies
Codes were identified for the two most commonly used coding systems used in the UK: Read version 2, (also described as '5-byte' because it has five character codes) and Read Clinical Terms version 3 (CTv3); and also for systematised nomenclature of medicine  17 Read codes are universally used in UK primary care and also internationally, a fourcharacter version of Read codes is still used in New Zealand, 18 and elsewhere. SNOMED CT 19 is increasingly used internationally and is progressively being introduced across the UK. 20,21 SNOMED CT is much more sophisticated than Read. It includes concept identifiers (ID) which categorise each code; the equivalent of the chapters in Read. Examples relevant to this paper include a concept ID being a 'disorder', 'situation', 'qualifiers' and 'findings'. In SNOMED CT diabetes is a disorder; the concept ID that excludes diabetes is a situation; and qualifiers are concept IDs that explain other concepts, for example, suspecting a condition. Impaired glucose tolerance in a lab test report is a finding, however, there is also a separate concept ID as a disorder. Also, a unified medical language system (UMLS) facilitates mapping of terms identified within SNOMED CT into other coding systems. 22

Consensus coding list
We have made these recommendations based on a consensus process and selecting codes from logical places within the coding system. For a hierarchical system, like Read version 2, we included codes as high up the hierarchy as possible, labelling codes which had child codes which are valid with a percentage sign (%). There are sometimes exceptions within the child codes, and if so we flag those that we recommend are not used. For a polyhierarchical system, where codes are linked in a matrix, it is not possible to use the percentage sign in the same way, with linkage following a different set of rules and relationships.

Choice of codes
We generally suggest using diagnostic codes, i.e. disorder concept IDs in SNOMED CT. It is possible to infer someone has diabetes from disease monitoring, administration or other codes. 23 However, our assertion is that it is better to use disease codes wherever possible -from Chapter 'C' of the Read version 2 terminology. Chapter 'C' includes: endocrine, nutritional, metabolic and immunity disorders. However, these labels are arbitrary, for example the International Classification of Disease uses 'Chapter E'endocrine, nutritional and metabolic diseases. The polyhierarchical CTv3 does not follow the same pattern of codes, although many codes in CTv3 used in diabetes and related conditions (e.g. impaired fasting glucose) start X40. Within SNOMED CT we always chose a 'disorder' code where available; using these ahead of situation or finding classified concept IDs.

Code browsers
The Read codes were identified from NHS Clinical Terminology Browser 24 and SNOMED CT from the Snoflake TM Browser. 25 The Terminology Brower is a hierarchical display of codes and their child codes; the Snoflake Browser combines a hierarchical list with a visual display of linked parent and child concepts. SNOMED CT terms are all mapped to SNOMED RT (Reference Terminology) and CTv3 within the Snoflake browser. However, we selected preferred SNOMED CT terms independently. Mapping the codes By consensus we agreed whether a code was a 'direct mapping', a 'possible mapping', or 'no clear mapping'. 26 Recommended code list for diabetes Coding diabetes Type 1 and type 2 diabetes coding recommendations Using Read2 T1DM is straightforward; we suggest using the C10E% hierarchy only. We recommend the use of the C10F% for T2DM; and all its child codes except two. The two child codes we recommend that are NOT used are C10F8 and C10FS. C10F8 is the code for 'Reaven's syndrome' -an eponymous name for metabolic syndrome. If practices use C10F8 for metabolic syndrome then searches on C10F% will include people who may not meet the diagnostic criteria for T2DM. The 'correct' code for metabolic syndrome belongs to the insulin resistance (C1A) hierarchy, and we advise using C1A0 for this. In CTv3, X40J4 and X40J5, and for SNOMED, CT 46635009 and 44054006 are the concept terms for T1DM and T2DM, respectively.

Genetic diabetes
To some extent all diabetes has a genetic component and this code is reserved for diabetes generally associated with a single gene defect. There is no single code that can be used for 'genetic' diabetes in Read 2, but there is in CTv3 and SNOMED CT; the CTv3 generic term is 'Genetic syndromes of diabetes mellitus' (X40JG) and the SNOMED CT concept ID 'Diabetes associated with a genetic disorder' (5969009).
One of the best known clinically described variants is maturity onset diabetes of the young (MODY). However, even this rare, but recognised condition has a number of different underlying genetic variants. 27 One of these can be an autosomal-dominant variant of type 2 diabetes, which has its own codes (C10D in Read 2, X40JJ in CTv3 and 237604008 in SNOMED CT). 28 The Read coding system currently has sections C10C% and C10D% assigned for genetic causes of diabetes, whereas in CTv3 several of the codes start X40... but again it will have many code roots reflecting its polyhierarchical nature. SNOMED CT concept IDs are dissimilar so there are no obvious alphanumeric indications of where any particular code lies in the hierarchy. The conceptual mapping displayed in the Snoflake browser has the potential to allow navigation around the coding system; however it is not supported by sufficient concept IDs within SNOMED CT. Unsurprisingly, the terms don't neatly meet developments in this fast-moving field.

Other (or secondary) diabetes
This area of diabetes is relatively easy to code. There is a generic code (C10N in Read 2, X40JA in CTv3 and 8801005 in SNOMED CT) and a range of other codes to fit with known other causes of diabetes. The concept map in the Snoflake browser is particularly useful, in displaying the 10 related concepts (Figure 3).

Unknown/unclassified
Where the diagnosis is uncertain, we suggest using the suspected diabetes mellitus code (1JL... in Read 2 and XaXPB in CTv3). In SNOMED CT there was no specific concept ID for suspected diabetes, although there were suspected disease codes for a range of other conditions. There are concept IDs for suspected heart disease, hypertension, etc. for 29 conditions, but not diabetes. This situation can be coded firstly using the SNOMED CT qualifiers 'Known possibly present' (410590009) or 'Suspected' (415684004).
Once the diagnosis is known, this can be superseded by the correct code; if diabetes is ruled out we suggest the diabetes mellitus excluded code (1I0... in Read 2, XaFvt in CTV3 or 315216001 in SNOMED CT) is used. Because computer systems assign dates to each code it is not difficult to identify the latest code to define the diagnosis.

Non-diabetic hyperglycaemia
Impaired fasting glucose and impaired glucose tolerance These are established biochemical definitions based on glucose levels in blood samples for the commonest forms of non-diabetic hyperglycaemia; notwithstanding that many CMR systems lack proper labelling of whether tests are taken fasted or not. 29 However, we are entering a period of transition in which glycated haemoglobin (HbA1c) may be used instead, which will remove this problem.
. Impaired fasting glucose (IFG) is defined as a fasting glucose between 6.1 and 6.9 mmol/L. However, this may be set to change with the move towards using HbA1c to define IFG; 30 although this remains open to debate. 31 There are two sets of units for HbA1c: the Diabetes Control of Complication Trial (DCCT), which is expressed as a percentage, is gradually being superseded by the International Federation of Clinical Chemists (IFCC) units which are expressed as mmol/mol. . Impaired glucose tolerance (IGT) is defined as blood glucose between 7.8 and 11.1 mmol/L tested 2 hours after a 75-g glucose load. There have been similar moves towards using HbA1c for diagnosis, 32 but as yet there is no consensus.
There is no precise rubric for IFG in the Read codes. The nearest is impaired fasting glycaemia (C11y3 in Read 2, XaIRY in CTv3 and 390951007 in SNOMED CT); there is a precise match for IGT (C11y2 in Read 2 and X40Jh in CTv3). There are alternatives offered in SNOMED CT for IGT, this can be coded as a disorder (9414007) or as a finding (166927002). In keeping with our general recommendations, we suggest using the disorder code. Usefully, CTv3 and SNOMED CT contain a specific code for IGT in pregnancy (X40JI and 237625008, respectively). As so often in large coding systems, there is another alternative for the same codes. However, these alternative codes in Read 2 are in the 'R' chapter -'Symptoms, signs and illdefined conditions'. We recommend using the 'C' chapter codes in Read 2; again there is greater complexity in CTv3. Our recommendation during this period of transition is that IFG and IGT codes are used with respect to glucose results; but to make sure that all HbA1c results are coded; where possible using IFCC results as these units are likely to endure.

Gestational diabetes
The new practical classification is clear about who has gestational diabetes. To have a diagnosis of gestational diabetes you must have diabetes only during the pregnancy, and not before. Unfortunately, most of the codes for gestational diabetes are contained in the L180% hierarchy of the Read 2 codes, which is described as diabetes mellitus during pregnancy, childbirth and the puerperium, and does not exclude previous diabetes. To avoid ambiguity, we suggest one single code -L1808 Diabetes mellitus arising in pregnancy; this same code is available in CTv3. This emphasises that diabetes must have 'arisen' during pregnancy. The SNOMED CT equivalent is 11687002.

Summary tables for each terminology
Summary tables set out recommended codes for each clinical system. They are also available on-line at: http://www.clininf.eu/diabetes_codes (Tables 1-3).

Principal findings
Although it is not possible to precisely map all the clinical concepts in the practical classification of diabetes to concepts in any of the three terminologies, it is possible to produce a workable list of clinical codes. Some parts of the classification are precisely and readily mapped. These are T1DM and T2DM. Gaps in other areas exist for different reasons. The gap in being able to code genetic types of diabetes appears to be related to the rapid advances in that area. There is an important semantic difference between the practical classification, which uses 'other' types of diabetes where the terminologies use 'secondary'. The one concept it is impossible to map with a very similar meaning is the 'unknown/unclassified' type of diabetes in which the best match is to use 'diabetes suspected' and if not confirmed 'use excluded'. For non-diabetic hyperglycaemia, we recommend coding HbA1c, or where appropriate glucose, and the value to avoid confusion. The manual mapping achieved through searching SNOMED CT and CTv3 browsers separately revealed no differences in mapping.

Implication of the findings
It is possible to create a working limited list of codes which will facilitate the identification and follow-up of people with diabetes. However, the data model in a coding system does not match the disease classification developed by clinicians and compromises are needed. Revision of the coding system could close this gap. However, the downside of constant revision is that the coding systems will have to carry forward the previous diagnostic category labels, resulting in an ever more complex range of coding alternatives. Different use of words in the clinical classification and the terminology, 'other' and 'secondary' respectively in this case, create barriers to the ready searching for codes from the natural language that might be used by clinicians. Clinicians and informaticians need to be sensitive to this and try to minimise these gaps. We could have just looked for CTv3 terms by searching the Snoflake browser and taking their mapping of CTv3 codes to SNOMED CT.  Despite the difficulties of working with coded data, it is probably a better alternative than working with freetext. 33 Gaps between coding systems and clinical concepts have been recognised for some time. Possibly longest in use is the International Classification of Disease (ICD) where it is recognised that often more than one code is needed to describe a clinical condition. The convention that has come about there is the use of the 'dagger and asterisk'. A dagger is applied to the primary diagnosis, with asterisks applied to the secondary, but also necessary, labels. 34 Coding systems carry forward legacy codes and how a lack of coding results in failure to identify cases of diabetes. There is considerable misclassification, miscoding and misdiagnosis of diabetes. [35][36][37] An expert reported disappointment that legacy codes such as 'Insulin-dependent diabetes mellitus, IDDM', are still used; a classification superseded by the division into T1DM and T2DM. 38 A cross-sectional survey of 3.6 million patients' electronic records found that around 1% of the UK population may have diabetes (based on glucose blood test results) that is not recorded on practice registers, 1 and diabetes prevalence estimates based on Health Survey for England HbA1c measurements in 2006 suggested that up to 27% of diabetics were undiagnosed or missing from practice registers. 3 Variable use of complex coding systems may account for some of these missing patients, who may be less likely to receive well-organised care as a result. Mapping between systems has challenges, 39 though the automated linkage created within the Snoflake browser appeared reliable.
Coding is not just a primary care issue. Whilst people with diabetes have longer lengths of stay in hospital, and are more likely to be readmitted, 40 there are also problems of poor coding in hospital. 41

Limitations
We could not create completely clear and unambiguous mapping between the practical classification and any of the classification systems. The limitations of the study include the relatively small group who agreed the clinical codes, and although we know that people who are not coded correctly receive inappropriate or a lower standard of care, 42 we lack direct evidence that improving quality affects outcomes for patients. Whilst we know that there is a problem with coding, we lack the evidence that putting this right will substantially improve care.

Conclusions
The higher the quality of clinical coding the easier it will be to ensure that individual patients receive the best care and that we can audit the quality of care and use primary care data in monitoring disease trends. The proposed mapping has limitations but is feasible to apply in clinical practice. The suggested codes should facilitate the consistent coding of diabetes in clinical records.

Recommendations
. Prospectively use the appropriate coding list.
1 In the UK Read version 2 for EMIS, in-practice systems (INPS), and iSoft brands of CMR systems); and Read Clinical Terms version 3 (for TPP SystmOne) are available at http://www. clininf.eu/diabetes_codes 2 The SNOMED CT table of codes could be applied in countries using that nomenclature and be mapped to other coding systems. . Critically appraise how people with diabetes are classified and coded in your CMR each time they attend practice diabetes clinics. . In the UK run the audit tools available at www.clininf.eu/diabetes.html to identify miscoded, misclassified and misdiagnosed people with diabetes. These can be replicated for use in other health systems.