How do clinical information systems affect the cognitive demands of general practitioners? Usability study with a focus on cognitive workload

Background Clinical information systems in the National Health Service do not need to conform to any explicit usability requirements. Poor usability can increase the mental workload experienced by clinicians and cause fatigue, increase error rates and impact the overall patient safety. Mental workload can be used as a measure of usability. Objective To assess the subjective cognitive workload experienced by general practitioners (GPs) with their systems. To raise awareness of the importance of usability in system design among users, designers, developers and policymakers. Methods We used a modified version of the NASA Task Load Index, adapted for web. We developed a set of common clinical scenarios and computer tasks on an online survey. We emailed the study link to 199 clinical commissioning groups and 1,646 GP practices in England. Results Sixty-seven responders completed the survey. The respondents had spent an average of 17 years in general practice, had experience of using a mean of 1.5 GP computer systems and had used their current system for a mean time of 6.7 years. The mental workload score was not different among systems. There were significant differences among the task scores, but these differences were not specific to particular systems. The overall score and task scores were related to the length of experience with their present system. Conclusion Four tasks imposed a higher mental workload on GPs: ‘repeat prescribing’, ‘find episode’, ‘drug management’ and ‘overview records’. Further usability studies on GP systems should focus on these tasks. Users, policymakers, designers and developers should remain aware of the importance of usability in system design. What does this study add? Current GP systems in England do not need to conform to explicit usability requirements. Poor usability can increase the mental workload of clinicians and lead to errors. Some clinical computer tasks incur more cognitive workload than others and should be considered carefully during the design of a system. GPs did not report overall very high levels of subjective cognitive workload when undertaking common clinical tasks with their systems. Further usability studies on GP systems should focus on the tasks incurring higher cognitive workload. Users, policymakers, and designers and developers should remain aware of the importance of usability in system design.

What does this study add?
• Current GP systems in England do not need to conform to explicit usability requirements.Poor usability can increase the mental workload of clinicians and lead to errors.• Some clinical computer tasks incur more cognitive workload than others and should be considered carefully during the design of a system.• GPs did not report overall very high levels of subjective cognitive workload when undertaking common clinical tasks with their systems.• Further usability studies on GP systems should focus on the tasks incurring higher cognitive workload.• Users, policymakers, and designers and developers should remain aware of the importance of usability in system design.

INTRODUCTION
Modern computer information systems have a proven record in supporting information management tasks in healthcare. 1,2However, clinical information systems lacking usable designs may increase the mental workload imposed on their users and negatively impact on patient safety. 3-5Today, with the nearly universal computerization of general practice in the UK, 6 general practitioners (GPs) are faced with the challenge of working with computer systems that could be potentially adding to their cognitive workload. 7P Systems of Choice (GPSoC) in England (Table 1) do not have to conform to any specified usability requirements, although there have been initiatives to promote usability supported by the National Health Service (NHS), including the Microsoft Health Common User Interface.8 Usability has been defined as 'the capacity of a system to allow users to carry out their tasks safely, effectively, efficiently and enjoyably'.9 It largely relates to the quality of the information design and system navigability, and has a lot to do with user interfaces.10,11 Usability is a very important aspect in system design 12 and may impact user satisfaction, user fatigue, clinical productivity, error rates, and overall patient safety.

Table 1. GP systems and suppliers in the NHS
GPSoC framework supplier: GPSoC compliant system/s CSC Computer Sciences Limited: SystmOne EMIS: LV, PCS and WEB Advanced Health and Care: Crosscare InPractice: Vision 3 iSOFT: Premiere and Synergy Microtest: Evolution and Practice Manager II Clinical care is a complex and mentally demanding activity.It usually involves multitasking, where communication skills 13 are combined with pattern recognition, clinical reasoning and problem solving skills. 14In primary care, the high levels of uncertainty, 15 the variety of clinical presentations seen 16 and need to deal with vast amounts of information routinely received from other care settings 17 add to this complexity.All of these contribute to the mental workload experienced in general practice. 18ental workload has been defined as the mental effort involved in performing any given task. 19Excessive simultaneous tasks or very demanding individual tasks may cause cognitive overload. 20In healthcare, multitasking may lead to errors. 21Information systems should help users complete tasks with ease.Their user interfaces should be carefully designed to ensure that the way information is displayed does not negatively affect the cognitive workload experienced by healthcare professionals. 22Improved user interfaces should reduce medical errors and improve patient care. 23cottish GPs have expressed reservations about the usability of their computer systems. 24Usability issues in healthcare have been noted elsewhere.Examples cited are fixed sequences of steps and tasks, poorly supported documentation and retrieval of information, 25 problems relating to the efficiency of use, intuitiveness, and poor fit for multiprofessional needs, 26 and poor integration into workflow. 27In healthcare information technology, the main challenge today appears to lie on the design of convenient, efficient and acceptable interfaces. 28][33] In the UK, poor usability has also been identified as a safety issue. 34,35uidelines for the evaluation of the usability of electronic record systems in primary care exist. 36Among several existing validated subjective mental workload tools, 37 NASA-TLX 38 has been commonly used in health care.This is a multidimensional scale consisting of two parts: 1) six 20-point visual analogue sub-scales that measure contributing factors to the overall cognitive load (measures of workload): mental demand, physical demand, temporal demand, performance, effort and frustration; and 2) a set of pairings combining the different factors to be rated according to relevance, in order to weight the on January

Materials
NASA Task Load Index (NASA-TLX) 43,44 is a tool designed to undertake subjective workload assessments and is free to use.We used a modified version (known as Raw TLX 38 ), consisting of six 20-point visual analog sub-scales that measure six contributing factors to the overall cognitive load (measures of workload): mental demand, physical demand, temporal demand, performance, effort and frustration (Figure 1).For this study, we dropped off the physical demand scale that we judged irrelevant.We transformed the tool for a web-based form using Google Forms.The survey contained instructions for the study, demographic questions, clinical scenarios and tasks requested and rating scales with descriptions of the measure.It also included a space for free comments after each task (Figure 2).
In order to enable participants to complete the study at a time of their convenience, we presented them with a set of common clinical scenarios and tasks and asked them to recall (rather than perform) how they use their systems to complete these tasks.This approach ensured that time involvement was minimized and prevented interference with work.
We collected a number of demographic variables (time in general practice, time using the system and the number of systems used) in order to identify if any of these factors have an impact on cognitive load.

Measures
Each questionnaire returned 50 scores in total: five scores corresponding to five sub-scales ratings for each of the 10 tasks.We refer to the average of these 50 scores as the 'survey score'.For each task, we computed a single average score, which we refer to as the 'task score'.Scores are given subjective importance given to the difference factors (sources of workload).This has been used in a variety of health information and communication technology evaluations such as nursing intensive care, 39 anesthetic work, 19 surgical training and surgery, 40,41 and electronic white boards. 42his study looks to identify where potential usability issues with current existing GP systems in the NHS in England may exist, by exploring the subjective cognitive workload experienced by GPs when they carry out common clinical tasks with their computer systems.The study hopes to raise awareness of the influence of usability on safety among system users, developers and policy makers.

METHODS
The study presented participants with 10 stories each describing a clinical scenario with a related computer task.GPs rated the subjective mental workload experienced with their system for that task, using the tool described below.

Setting
The study was carried out in the primary care setting in England.

Population
Eligible participants were GPs who at the time of the study: 1) were working or had worked within the past 6 months, for the NHS or the Ministry of Defence in England and 2) were using or had used one or more of the currently existing GP systems in a paperless or paper-light practice/setting.Participation in the study was voluntary.Data were anonymously collected through an online survey.Faxvaag, 45 we use 'task' here as a high-level description.Each task involves multiple actions, each of which contributes to the overall cognitive workload.These will involve the user interacting with the information system using keyboard and mouse (or equivalent) and making various decisions.We adapted the clinical scenarios from real cases, taking into account the available data on the most frequent presenting complaint groups and most frequently prescribed drugs. 46e ensured the final validity of the scenarios by asking participants about the representatives of the cases during a pilot test.The final list of clinical scenarios and tasks is available in Table 2. in a scale 0-100.Additional measures were the number of years in general practice, the number of years the system had been used and the number of other systems used.

Selection, design and validation of clinical scenarios and tasks
We made a selection of tasks using Laerum and Faxvaag's task list, 45 with some additions judged to be relevant to GP clinical information systems.The tasks selected belong to the generic following activity groups: retrieving information, entering information, prescribing and managing prescriptions and dealing with laboratory results.Following Laerum and Hypertension follow-up consultation for a 63-years old, irregular attender male patient.Additional significant problems include diabetes and coronary heart disease.
RECALL how you would do the task indicated in the GP system you picked and rate that system on the five dimensions given: TASK : review his over time values for BLOOD PRESSURE, LDL and HbA1C.Task: find previous existing entry for the depressive episode and medication prescribed.
Find episode 52-year-old female with a 24-hours history of uncomplicated gastroenteritis.Task: record the problem using a clinical code for 'Gastroenteritis'.Enter the history, examination and management details using free text.

Survey design
An initial version of the survey was piloted by eight GPs.They confirmed the correct functioning of the online survey and appropriateness of the clinical scenarios.They made comments on the understanding of the scales and the scoring system, the clarity and level of detail of the clinical scenarios and the general design of the survey.The changes introduced resulted in a survey with 10 simplified clinical scenarios and three demographic questions for 'time in general practice', 'time using the system' and 'number of systems used'.

Sampling method and participants' recruitment
The target population included an estimated 40,000 GPs in England. 47Due to funding limitations, we used existing electronic communication channels to try reaching the largest number of GPs possible and maximize responses rates.
During late May and June 2013, we submitted an electronic request to 199 clinical commissioning groups, who are responsible for organising the delivery of NHS services in England, requesting assistance with the survey distribution.We also emailed directly 1,646 GP practices with an invitation to participate in the study.

Methods of analysis
We used SPSS for the statistical analysis.We looked at the correlation between all 50 variables obtained for each questionnaire, followed by dimension reduction by factor analysis.We computed 12 additional scores: 10 task scores, an average score for all tasks (referred to as survey score), and a score for the number of additional systems used.
We obtained means with confidence intervals for the survey scores and the tasks scores.We looked for statistical differences between the tasks and the systems using ANOVA for repeated measures.We also looked at the influence of the variables, 'time in general practice', 'time using system', and 'number of other systems used'.

RESULTS
Sixty-seven respondents completed the online survey.A precise estimate of the responses rate was not possible because we did not know the number of GPs who received Add the following: Amlodipine 5 mg tablets (Supply 56 tablets) Take one tablet once a day Pravastatin 80 mg tablets (Supply 56 tablets) Take one tablet daily Authorize three issues Re-authorize the following (issue limit reached): Lisinopril 10 mg tablets Take one tablet once a day.Supply 28 tablets.Authorize three issues medication review in 6 months.9 A 45-year-old male patient comes today to find out the results of a recent blood test.Task: look up the results on the system.

View labs 10
You are reviewing your system labs inbox.One report shows: normal FBC, glucose and U&E; cholesterol 5.9 mmol/L and ALT 54 U/L.Task: add a comment advising the patient to come in to discuss the results.Mark the report as seen and file/archive as necessary.
Action labs the invitation.However, a gross estimate could be between 0.5 and 1%, based on a possible number of GPs receiving the survey of 5,000-10,000.
The distribution of systems used by participants was 55.2% for all combined EMIS systems (25.4% LV, 10.4% PCS and 19.4% Web), 29.9% for SystmOne, 9.0% for INPS Vision, 3.0 % for iSoft Synergy, 1.5% for Microtest Evolution and 1.5% for other systems not in the GPSoC approved list.The mean time the system had been used was 6.7 years, the average time in general practice 17.8 years and the average number of other systems used 1.5 (Table 3).
According to a report from 2011, 48 the market share of GP systems in England was 55% for EMIS, 19% for INPS Vision, 17% for TTP SytstmOne, 7% iSoft and 2% Microtest.More recent data suggested an EMIS market share of 54.8% and iSoft share of 5.6%; 49 also, TTP SystmOne was set to become the second biggest supplier. 50Based on these details, we projected a possible current market distribution of the systems as follows: 54.8% for EMIS, 19.6% for TTP SytstmOne, 18% for INPS Vision, 5.6% iSoft and 2% Microtest.According to these figures, the system distribution of our sample was not significantly different from the population distribution, χ 2 (4) = 7.64, p > 0.10.
Although we did not have population data on the average time for GPs in general practice, the average time the system had been used and the average number of other systems used, a qualitative study in Scotland investigating the views of GPs on their medical records (n = 25 GPs) reported an average time in general practice of 16.5 years, 24 which is similar to the number found in our study.
We identified a potential problem with the 'performance' scale, which appeared to have been marked in the wrong direction in a number of cases.Some participants also indicated this in the free-text section.The NASA-TLX user guide document highlights a possible confusion with this scale.This has been reported in other studies. 23We carried out a correlation analysis that also showed a problem with the scale for performance.Therefore, we dropped out this scale for the analysis.Otherwise, this showed that all the scores were highly correlated.
A dimension reduction by factor analysis showed one factor accounting for around 67% of the variance, where all the scales where highly and equally correlated within the on January 19, 2021 by guest.Protected by copyright.
http://informatics.bmj.com/BMJ Health Care Inform: first published as 10.14236/jhi.v22i3.85 on 1 October 2015.Downloaded from 'find episode' and 'repeat prescribing' also scored significantly higher than the task 'acute prescribing'.Finally, 'the values over time' and 'the record structured' did not score significantly different from any of the other tasks.The tasks, however, were not significantly different among the different systems in a repeated measures ANOVA; test for interaction between system and task: (F (36, 189.1) = 0.7, p = 0.9).The difficulty of the tasks was not related to the time the GPs had been in general practice when that was included as a covariate in a repeated measures ANOVA (F (1, 55) = 0.3, p =0.58).The same was true of the number of other systems used (F (1, 55) = 0.03, p =0.87), but difficulty was related to the time the system had been used for (F (1, 55) = 5.4, p =0.024).
In summary, the overall aggregate workload score was not different among systems.There were significant differences among the task workload scores, but these differences were not seen among the different systems.The overall aggregate score and average task scores were not factor.Other studies have previously reported that scales are often significantly correlated with each other. 38Since all the scales and tasks correlated well, we created a single aggregated score with the average of all 40 remaining ratings (after dropping the performance scale).We also computed a total score for each task, from averaging the scores of the four scales.
A repeated measures ANOVA of the total scores for each task revealed statistically significant differences between them (F (9, 58) = 6.1, p = 0.001).The data in Table 4 and the graph in Figure 4 show that the tasks 'overview records', 'find episode', 'repeat prescribing' and 'drugs management' scored significantly higher than the tasks 'record unstructured', 'view labs' and 'action results'.The tasks  53 and we found recall worked well in a separate clinical study. 54Our results should be interpreted in the context of other research that does directly assess performance. 55, 56econd, we dropped off the physical demand scale.Modifications involving adding or deleting sub-scales or redefining them have been used in several studies. 38The specific irrelevance of the physical demand scale when assessing electronic health records has been observed in other studies. 23,54We proposed that this modification should not have a significant impact on the validity of this study, but undertaking a validity assessment was not within the possibilities of this study.
Finally, we also removed the weighting part.This has been the most common modification made when using NASA-TLX, followed by averaging or adding the scores to compute an estimate of the overall workload (which has been referred to Raw TLX) or analyze sub-scales separately.Studies comparing TLX with Raw TLX have demonstrated the latter to be less, more or equally sensitive; both approaches, however, have demonstrated the diagnostic value of the sub-scales. 38,57ike in any survey study where participation is voluntary, this study was subject to a self-selection bias. 58Also, responses rates in survey studies with physicians are often very low, 59 and electronic means are only of limited utility. 60he sample we obtained, however, did not appear to differ from the population on a number of variables we had available for comparison.Considering the means and standard deviations observed, sample sizes of around 100 responses per system could have been enough to detect potential existing differences among systems.

Conclusions and implications of the study
Since cognitive workload is only one among many measures of a system's usability, no conclusions on the current GP system's usability quality can be drawn from this study.Also, because of the lack of existence of an acceptable cognitive workload threshold for NASA-TLX in electronic health record evaluations, a sound statement on weather current system's mental workload is acceptable is not possible.Nevertheless, two things can be inferred: first, on the whole, probably no major usability issues exist and second, for some tasks there is probably room for improvement.
The study has identified four GP systems tasks attracting a higher workload score: 'repeat prescribing', 'drugs management', 'overview records' and 'find episode'.Further usability studies could focus further on these tasks, using existing guidance on usability evaluations in primary care. 3,36HS policymakers should remain aware of the importance of usability in system design and might consider following steps in other countries to develop policies around usability requirements. 36lated to the time the GPs had been in general practice or the number of other systems used, but they were related to the time the system had been used.

Interpretation of results
With the lack of consensus on what should be considered as a threshold for a high or excessive workload 38 -the scores appear to be most useful when used to make comparisons, we looked at the scores in other studies; 22,23,51 compared to these, the workload score found in this study (28.7) appears to sit on the low side, although lower scores have been noticed in some.This probably reflects the familiarity of GPs with their systems and could lead to conclude that no major issues exist.
The tasks evaluated were chosen to represent common tasks routinely carried out by GPs.Tasks with a higher cognitive workload score belonged to the categories 'retrieving information' ('overview records' and 'find episode') and 'prescribing' ('repeat prescribing' and 'medications management').The task 'overview records' is made of a number of subtasks, but its definition makes sense as a conceptually unique, identifiable task routinely carried out.However, it could be arguably expected that it would incur higher scores than single, smaller tasks.Likewise, the tasks 'repeat prescribing' and 'drugs management' are also composites of smaller subtasks and their higher scores may not come as a surprise either.Prescribing has been an area of high concern in medical errors, and mistakes have been related to the use of computer systems: selecting the wrong drug or dosage instructions from pick lists, overriding drug-drug interaction alerts, unnecessary/ inappropriate alerts, the need to maintain an accurate electronic health record and excessive expectations from the computer system. 52Finally, 'find episode' is a task whose difficulty may indeed be influenced, as many responders pointed out, by the quality and consistency of the recording of the previous episode.
The study did not find any differences on the cognitive workload among the different systems.However, it is likely that the study has been underpowered to detect these.This is discussed further in the following section.

Study limitations
In the study, we introduced a few modifications to NASA-TLX.First, we asked the GPs to recall instead of performing the tasks rather than using alternative methods directly assessing that performance.We considered this logistically necessary to prevent the study interfering with participants' work and negatively impacting responses rates.A study in a nonclinical setting showed that little information is lost when the ratings are obtained retrospectively using a visual recreation of a on January

Figure 2 .
Figure 2. Web version of a scenario and NASA-TLX rating scales 19, 2021 by guest.Protected by copyright.

Table 2 . Clinical scenarios and tasks
BMJ Health Care Inform: first published as 10.14236/jhi.v22i3.85 on 1 October 2015.Downloaded from 22-year-old female patient tells you that she has had depression in the past and she thinks this is coming back.She was given an antidepressant before but she doesn't think it helped much.
1Home visit request for an 82-year old lady who is 'confused'.Task: in preparation for the visit, please check: current/active problems, past relevant/significant problems, active prescriptions (acute and repeat), recent consultation/s, recent lab results.Overview records 2 Hypertension follow-up consultation for a 63-year-old, irregular attender male patient.Additional significant problems include diabetes and coronary heart disease.Task: review over time values for blood pressure, LDL and HbA1C.Values over time on January 19, 2021 by guest.Protected by copyright.http://informatics.bmj.com/ Record unstructured42-year-old patient, smoker of 20 cig/ day, with acute respiratory symptoms for 3 days.Temperature 37.4 C, BP 142/91, pulse 92 bpm.Task: record the problem using a clinical code for 'lower resp tract infection'.Enter the history, physical examination and management details using free text.Record his temperature, blood pressure, pulse and smoking status using structured data entry.Inhale one to two puffs up to four times a day, when required to relieve breathlessness.Hypertension follow-up consultation with a 47-years male patient.Four weeks ago you added amlodipine and changed his simvastatin to pravastatin.Blood pressure today 141/89.LFTs and U&Es normal.He reports no side effects to the new medications.
Salbutamol 100 micrograms/actuation inhaler CFC free (Supply 1 200 dose inhaler) Inhale one to two puffs up to four times a day (Prescribed it due to hay fever related wheezing ; no longer needed)