Informatics 2000feb5

Journal of Informatics in Primary Care 2000 (February):12-15


A Method of Assessment of Reliability of Coding Clinical terms to ICD-10 and ICPC Using ENCODE-FM©, a Primary Care Controlled Clinical Terminology

Robert M Bernstein, PhD, MDCM, CCFP, FCFP1,2, Gary R Hollingworth, MD, CCFP, FCFP1, Gary Viner, MD, CCFP1, Paul Miller, MD

1    Medical Informatics Research Group, Department of Family Medicine, University of Ottawa
2    WONCA Classification Committee Member, and Member of the Canadian Institute for Health Information Partnership on Health Informatics and Telematics


Background: Data entry into electronic medical records remains a barrier to their use in primary care. One of the difficulties in data recording has been the use of terminologies unsuited to clinical data entry by physicians. Canada has chosen ICD-10 as its standard of classification of medical diagnoses and the World Organization of Family Doctors created and uses ICPC-2. In this study, we tested a clinical terminology for reliability of classification. ICD-10 is not intended to be used by clinicians as care is given, and ICPC is too small to be useful to follow patients in a clinical record. ENCODE-FM© is a clinical terminology specifically designed to overcome these limitations and provide both clinical specificity of health problems for patient care, and data aggregation for statistics and research. This study was intended both to test the reliability of data entry using ENCODE-FM© and to serve as a model methodology for testing vocabularies in general.

Method: Terms for "reason for encounter" taken from a random selection of encounter forms in family practice were coded by five different physician coders using a computerised search engine for ENCODE-FM©. Intraclass correlations were calculated to see how well clinical data grouped to ICD-10 and ICPC.

Results: Use of the ENCODE-FM© clinical terminology resulted in highly reliable data aggregation to the standard international classifications ICD-10 and ICPC. Intraclass correlations were .87 (p<.001) and .85 (p<.001).

Interpretation: The study shows that the method of assessment is both simple and acceptable. ENCODE-FM© can be used reliably for data entry into an electronic medical record, and analysis of coding errors suggests that direct data entry by care providers would be more reliable than third party coding. Physician coders prefer simple partial word searches.

Keywords: Classifications; terminologies; ICD-10; ICPC; ENCODE-FM©; clinical vocabularies


The electronic medical record (EMR) has been claimed to have the potential to improve information flow between different sectors in health care systems, and to provide a structure in which better patient outcomes and improved efficiency of clinical care can be accomplished. In order for these potential gains to be realised, the EMR has to have a backbone of a controlled terminology that allows the computer to "recognise" certain concepts, such as symptoms and diagnoses. This allows the EMR to be used for decision support (for example, to give an alert before a betablocker is prescribed to an asthmatic) and for standardised data collection, in order to recognise patterns and trends both within the physician practice, and more globally for rational allocation of health resources.

Until now, most EMRs in Canada and the United States of America have used billing codes for diagnoses and problem lists and for collection of statistics about diagnoses. These have been notoriously inaccurate, partly because the standardised list of codes has had little clinical relevance in primary care, and partly because data collection has been an imposed task on physicians independent of the clinical care record. However, some form of structured data entry is necessary.

Typically, the terminologies used have been statistical classifications of disease, such as ICD-9, which bear little resemblance to clinical terms used in the field. The problem is compounded in primary care because these classifications do a poor job of representing symptoms and complaints. The World Organization of Family Doctors (WONCA) has developed a terse clinical classification (The International Classification of Primary Care – ICPC-2)1, which has been used for structured data input successfully mainly in Europe. ICPC has been criticised because its 660 terms for symptoms and diagnoses do not provide enough specificity for a clinical record to follow patients. The WONCA classification committee has endorsed the combination of ICPC and ICD-10 as an acceptable classification and nomenclature for primary care records, ICPC supplying the classification and specificity of symptoms and complaints, and ICD-10 the additional specificity of diagnoses. ICD-10 alone is difficult to use by primary care providers, lacks many common symptoms and complaints, requires extensive training and expertise to use, and even then there is evidence that reliability is lacking2.

Other terminological systems exist, but have been found wanting in the primary care field3. They do not provide good content coverage for primary care in spite of being up to 280,000 terms4, and no reliability data exist for any terminological system with the notable exception of ICPC alone5, and ENCODE-FM©6.

Approaches to structured data entry using ICPC as an index to ICD-10 have been tried in the Netherlands, but forcing family physicians to use a new mechanism of data recording is cumbersome and alternatives need to be found for the North American practice environment. One approach is to use a controlled clinical terminology – a set of clinical terms germane to primary care written in clinical language – as a mechanism of data entry, and map the clinical terms to the international standards ICPC and ICD-10. Given that no such clinical terminology existed for primary care, and the Ontario Family and General Practice Data Standards Project7 also recommended ICPC and ICD-10 for primary care records, we wrote ENCODE-FM©, (ENCODE) expressly for use in primary care electronic medical records8.

ENCODE was designed as a controlled clinical terminology of symptoms, complaints, diagnoses, disorders and reasons for encounter for use at the point of service. It is intended as a data entry tool for the primary care professional, and to allow clinical data to be classified by ICPC, ICD-10 and ICD-9CM. Other areas of the clinical record – services and procedures, medications, and outcome measures – are best represented using terminologies specifically designed for those purposes7, although pharmaceuticals are problematic in all systems. Descriptions of ENCODE have been published previously6,8.

In a previous study, we tested data entry of clinical primary care "encounter form" terms for reliability of classification to ICPC. Using ENCODE as the clinical interface, we obtained an 83.9% substantial agreement in ICPC classification, and most of the variability of coding could be attributed to the imprecision of the encounter form terms, and the fact that coding was done by third parties6. These reliability data compare very favourably to data for ICPC alone5 with the additional advantage of much more clinical specificity.

This study looks at the statistical reliability of classification to ICD-10 using ENCODE for data entry, and adds statistical analysis to the previously published data on ICPC. Since the Canadian Institute of Health Information has standardised on ICD-10 as the classification for reporting morbidity and mortality9 in Canada, and ICD-10 is the recognised international classification standard, it is essential to determine if reliability to ICD-10 classification is acceptable.



The methods for this study were delineated previously6 and are summarised here. One hundred and six clinical terms for reasons for encounter were randomly selected from encounter forms in our family medicine centre. Each of five coders unfamiliar with classifications and ENCODE was given a computer program which allowed them to locate the best matches they could in the ENCODE controlled clinical terminology. The coders were asked to rate the quality of the match between the clinical term and the ENCODE term, and to indicate the search strategy finally used to find the ENCODE term. Data were analysed by the SPSS™ Intraclass Correlation macros using model 310 in which all raters rate each item, and the raters constitute the entire population studied. Correlation estimate was for single raters (the reliability of any single rater was estimated).



Substantial concordance (four or five of the five coders agreeing) was 83.9% for ICD-10 at the 3-digit level (ICD-10 has 2,600 terms at the 3-digit level) and 83% at the 4-digit level (14,000 terms). The intraclass correlations were 87% for each, and are highly statistically significant (Tables 1 and 2). As previously reported, all clinical terms were able to be matched to an ENCODE term, and the accuracy of the match was rated as good or excellent in 91.7% of the cases. The intraclass correlation coefficient for ICPC classification was 0.85 (.80–.89).

Reliability of coding Freq Percent   Reliability of coding Freq Percent
5/5 ICD-10 codes identical 69 65.1% 5/5 ICD-10 codes identical 62 58.5%
4/5 ICD-10 codes identical 20 18.9% 4/5 ICD-10 codes identical 26 24.5%
3/5 ICD-10 codes identical 11 10.4% 3/5 ICD-10 codes identical 12 11.3%
<3 ICD-10 codes identical 6 5.7% <3 ICD-10 codes identical 6 5.7%
Total 106 100% Total 106 100%
Substantial concordance (4 or 5) 89 83.9% Substantial concordance (4 or 5) 88 83.0%
Intraclass Correlation p Upper-Lower bound Intraclass Correlation p Upper-Lower bound
.87 <.001 .83 -.90 .87 <.001 .83 -.90

Table 1: ICD-10 3-digit                                         Table 2: ICD-10 4-digit

89.4% of the terms chosen were found by a simple string search in the whole terminology, although two other strategies were provided. Using the hierarchical structure and display was employed only 5% of the time.



In our methodology, a random sample of terms from the field was used. Common diagnoses were more likely to be duplicated, rarer ones not. This is not a bias, but a reflection of the frequency distribution of the health problems seen in practice. A terminology which does a good job of representing rare problems and a poor one of representing symptoms and common health problems may appear better if the clinical terms to be coded are selected systematically to "cover" the field. We believe strongly that in testing terminologies the test must reflect the usual content of primary care.

This study also suggests that simple search strategies are preferred by physicians. Searching for a term by walking through the hierarchical structure of ENCODE was used only 5% of the time, whereas a partial word search was used almost 90% of the time.

Why, since ICD-10 at the 3-digit level is substantially smaller than at the 4-digit level, and ICPC is even smaller, are the reliabilities so similar? The answer lies in the consideration of which terms were coded unreliably. Clearly, the problem is not with ENCODE, nor for that matter with ICD-10 or ICPC. The problem is that the clinical terms taken from the records were themselves imprecise and therefore difficult to code by third party coders.

"Inspection of the seventeen terms with coding variability suggests that vagueness of the clinical encounter form terms, confusion by the coder between symptoms and diagnoses, and plain error accounted for all but three."6

Those terms for which the meaning was not clear were coded less reliably by all three of the classifications. This is evidence in favour of direct data entry by care providers, and suggests that although third party coding is very reliable, direct data entry is likely to be significantly more so.

Reliability data for the process of going from a nomenclature of terms to a classification is simply lacking for all terminologies except ENCODE. Research with SNOMED, UMLS, and Read Codes3 suggests that even content coverage is a problem in primary care. In our study, no terms were unmatched and the satisfaction of the raters with the match was very high. In a study performed for the Canadian Institute of Health Information using Read Codes and SNOMED, the average time to find an appropriate code was 3 minutes, and some terms were still unmatched11. Although a search for a partial term in the complete ENCODE file takes less than 2 seconds using a Pentium 233 machine, time taken to find an appropriate clinical term was not measured in this study.

In North American primary care the existence of the full electronic medical record with all data coded is a myth. Data entry problems, specifically with the time required to code data are prohibitive. Simpler systems, for which database functions provide value added to the physician will require only that specific elements be coded. Different areas of the primary care health record require different terminological systems. There is no evidence that a general "terminology of everything" is either comprehensive, accurate or reliable. Current trends suggest that the following standards be used in primary care in Canada:

  • ICD-10 and ICPC for symptoms, complaints, diseases, risk factors, health problems and episodes (accessed through ENCODE-FM©)
  • CCI12 (Canadian Classification of Interventions) for processes of care and procedures
  • ATC13 (Anatomic & Therapeutic Classification) for drugs
  • LOINC14 for laboratory tests and physical measurements
  • HL-715 for data transmission



The evidence from this study indicates that classification and aggregation of morbidity data by the international standards ICD-10 and ICPC is highly reliable using ENCODE-FM© as a data entry clinical terminology, and that health care providers should be encouraged to do their own primary data entry in order to ensure maximum reliability. Simple search strategies are preferred. The methodology of terminology evaluation was simple, unbiased, and acceptable to users.


John Shearman MBBS, CCFP; Claude Labelle MD CCFP; Roger Thomas, PhD, MD, CCFP all contributed substantially to the data collection, and discussion and feedback regarding vocabulary issues and electronic medical records in primary care. Professors Charles Bridges-Webb who has chaired the WONCA International Classification Committee, and Henk Lamberts and Maurice Wood have contributed through multiple provocative discussions over many years.



1 ICPC-2: International Classification of Primary Care. Second Edition. Oxford University Press, Oxford, 1998
2 Smith MW. Hospital Discharge Diagnoses: How accurate are they and their International Classification of Diseases (ICD) Codes? New Zealand Med Jour 1989; 102:507–508
3 Mullins HC, Scanland PM et al. The Efficacy of SNOMED, Read Codes and UMLS in Coding Ambulatory Family Practice Clinical Records. J Amer Medical Informatics Assoc Symposium Supplement 1995; 20:135–139
4 Chute CG et al. The Content Coverage of Clinical Classifications. J Amer Medical Informatics Assoc. 1996; 3:224–233
5 van der Horst F, Metsemakers J, Vissers F, Saenger G, de Geus C. The reason for Encounter Mode of the ICPC: reliable, adequate and feasible. Scand Jour Primary Health Care 1989; 7:99–103
6 Bernstein RM, Hollingworth GR, Viner G, Shearman J, Labelle C, Thomas R. Reliability issues in coding encounters in primary care using an ICPC/ICD-10-based controlled clinical terminology. J Amer Medical Informatics Assoc Symposium Supplement 1997; 21:843 and D004493
7 Family/General Practice Data Standards Project: Process and Data Modeling Project Report. Publications Ontario 70 Grosvenor St, Toronto, November 1992
8 Bernstein RM, Hollingworth GR, Viner GS. ENCODE-FM (Electronic Nomenclature and Classification Of Disorders and Encounters for Family Medicine/ CODE-MF (Codification Electronique pour Medicine Familiale). ISBN # 0-88927-029-5, 1997. INSITE-Family Medicine Inc., 1910 Wembley Ave., Ottawa, Canada K2A 1A7. email Web site:
9 ICD-10 Implementation Study Advisory Group. Achieving Standardization In Diagnosis And Intervention Classification: Future Directions For Canada. Report to the Canadian Institute for Health Information Board. November 1995. Available from CIHI, 377 Dalhousie St., Ottawa, Canada, K1N 9N8
10 Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 1979; 86:420–428
11 Anderson, John. Controlled Language Study, Pilot Phase. Report presented to the Canadian Institute of Health Information Partnership for Health Informatics/Telematics, Working Group 2. Oct. 16, 1997
12 Lalonde AN, Taylor E. Classification Systems in Canada: Moving Towards the Year 2000. Can Med Assoc J 1997; 157:1561–1565
13 Pahor M, Chrischilles EA, Guralnik JM, Brown SL, Wallace RN, Carbonin P. Drug data coding and analysis in epidemiologic studies. Eur J Epid 1994; 10:405–411
14 LOINC: Available from
15 Health Level 7:



  • There are currently no refbacks.

This is an open access journal, which means that all content is freely available without charge to the user or their institution. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles in this journal starting from Volume 21 without asking prior permission from the publisher or the author. This is in accordance with the BOAI definition of open accessFor permission regarding papers published in previous volumes, please contact us.

Privacy statement: The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.

Online ISSN 2058-4563 - Print ISSN 2058-4555. Published by BCS, The Chartered Institute for IT