Development and Implementation of a Standard Format for Clinical Laboratory Test Results
Abstract
Surprisingly, laboratory results, the principal output of clinical laboratories, are not standardized. Thus, laboratories frequently report results with identical meaning in different formats. For example, laboratories report a positive pregnancy test as “+,” “P,” or “Positive.” To assess the feasibility of a widespread implementation of a result standard, we (1) developed a standard result format for common laboratory tests and (2) implemented a feedback system for clinical laboratories to view their unstandardized results.
In the largest integrated health care system in America, 130 facilities had the opportunity to collaboratively develop the standard. For 15 weeks, clinical laboratories received a weekly report of their unstandardized results. At the study’s conclusion, laboratories were compared with themselves and their peers by metrics that reflected their unstandardized results.
We rereviewed 156 million test results and observed a 51% decline in the rate of unstandardized results. The number of facilities with fewer than 23 unstandardized results per 100,000 (Six Sigma σ > 5) increased by 58% (52 to 82 facilities; β = 1.79; P < .001).
This study demonstrated significant improvement in the standardization of clinical laboratory results in a relatively short time. The laboratory community should create and promulgate a standardized result format.
-
The format of laboratory results is not standardized from one laboratory to another, despite these results representing the primary output of a clinical laboratory.
-
In the largest integrated health care system in America, 130 facilities collaboratively developed a standard test format result, leading to a 50% reduction in unstandardized results in 15 weeks.
-
The laboratory community should create and promulgate a standardized result format to report results unambiguously for patients, clinicians, and researchers.
Introduction
The format of laboratory results is not standardized from one laboratory to another, despite these results representing the primary output of a clinical laboratory.1 For example, a positive test for pregnancy may be variously reported as “+,” “Pos,” or “Positive.” For patients, who now access their medical record through online portals, variation in the format of a laboratory result can create confusion.2 It also negatively affects public health notifications of infectious diseases, the calculation of health care performance metrics, health information exchanges, and research databases.3-6
Existing standards do not specify a result format. For example, the most frequently used standard terminology for laboratory data, Logical Observation Identifiers Names and Codes (LOINC), specifies only a test’s “scale” (ie, quantitative or ordinal). Laboratories are free to choose the format. Example categories for the ordinal results include “+/−,” “P/N,” and “positive/negative/indeterminate.” 7 Similarly, no laboratory standard advises on how a laboratory should report a cancelled test, and various formats exist: “cancelled,” “TNP” (ie, test not performed), “DNR” (did not report), and “QNS” (quantity not sufficient).
The absence of a result standard has propagated various strategies to cope with unstandardized results. For example, a patient who encountered the result “Occ,” laboratory jargon for “occasional,” in their online health portal sought to decipher it online.8 As another example, researchers have developed algorithms to standardize laboratory results in clinical research databases.9 As new tests are developed and novel formats adopted for existing laboratory tests, these standardization algorithms require significant ongoing maintenance. They are also unable to interpret certain ambiguous results.4 Overall, these strategies do not engage with the root cause of the unstandardized laboratory results: the clinical laboratories that produce them.
To investigate the feasibility of a prospective collaboration with clinical laboratories and to approximate the effort required for a national implementation of a clinical laboratory standard, we (1) engaged in the collaborative development of a standard result format (“the Standard”) with more than 100 laboratories and (2) implemented a feedback system for clinical laboratories to view their unstandardized results to facilitate improvement.
MATERIALS AND METHODS
Setting
The Veterans Health Administration (VHA), the largest integrated health care system in the United States, serves 8.8 million veterans across 130 facilities, each with 1 or more Clinical Laboratory Improvement Amendments–certified clinical laboratories. These laboratories are located throughout the United States, 1 US territory (Puerto Rico), and 1 foreign country (Philippines). All clinical laboratories in the VHA system participated in this quality improvement project. Facilities were made aware of the study through a national call for laboratory directors, which provided the opportunity for questions and answers.
Result Standard Development and Implementation
The development of the Standard took place in an iterative process made up of an initial proposal phase and a second modification phase. In the proposal phase, the most common tests from each facility were selected for inclusion in the study, cumulatively accounting for at least 95% of a facility’s total monthly test volume between the years 2000 and 2015. For each of these tests, we specified a standard result format. For example, the standard format for a urine pregnancy test allowed “Positive” or “Negative.” In addition to test-specific results, we created result formats that generalized to many tests, such as “Canceled,” “Specimen quantity not sufficient,” and “See comment below.” Table 1 and the Standard (see Supplement; all supplemental materials can be found at American Journal of Clinical Pathology online) contain additional examples. In the subsequent modification phase, laboratories provided feedback on the acceptable result formats, making our standard a compromise among stakeholders rather than a dictated policy.
Categorization of Standard Test Result Types, With Examplesa
Category | Standard Format | Example Results |
---|---|---|
Numeric | Single | >1.0, ≤0.1, −0.4, 2 |
Range | 0-2, 9,998-9,999 | |
Titer | <1:40, 1:256 | |
Binary | Binary | Positive, Negative, Indeterminate, Yes, No |
Ordinal | Quantity | None, 1+, 2+, 3+, 4+ |
Relative to baseline | Normal, Increased, Decreased | |
Nominal | ABO blood group | A, B, AB, O |
ABO/Rh blood group | A positive, AB negative | |
ANA pattern | Centromeric, Homogeneous | |
ANCA | C-ANCA, P-ANCA | |
Color | Colorless, Yellow, Orange, Pink | |
Crystals | Uric acid, CCPD, Cholesterol | |
HCV genotype | 1a, type 1, 2a/2c | |
Immunofixation | IgG-κ, IgM-λ | |
Opacity | Clear, Hazy, Cloudy, Turbid | |
Opiates confirmation | Hydrocodone, Codeine, Morphine | |
General | Test Not Performed | Not ordered, Error, Hemolyzed, Icteric |
Test Performed | See comment, See below, Performed |
Category | Standard Format | Example Results |
---|---|---|
Numeric | Single | >1.0, ≤0.1, −0.4, 2 |
Range | 0-2, 9,998-9,999 | |
Titer | <1:40, 1:256 | |
Binary | Binary | Positive, Negative, Indeterminate, Yes, No |
Ordinal | Quantity | None, 1+, 2+, 3+, 4+ |
Relative to baseline | Normal, Increased, Decreased | |
Nominal | ABO blood group | A, B, AB, O |
ABO/Rh blood group | A positive, AB negative | |
ANA pattern | Centromeric, Homogeneous | |
ANCA | C-ANCA, P-ANCA | |
Color | Colorless, Yellow, Orange, Pink | |
Crystals | Uric acid, CCPD, Cholesterol | |
HCV genotype | 1a, type 1, 2a/2c | |
Immunofixation | IgG-κ, IgM-λ | |
Opacity | Clear, Hazy, Cloudy, Turbid | |
Opiates confirmation | Hydrocodone, Codeine, Morphine | |
General | Test Not Performed | Not ordered, Error, Hemolyzed, Icteric |
Test Performed | See comment, See below, Performed |
ANA, antinuclear antibody; ANCA, antineutrophil cytoplasmic antibody; C-ANCA, antineutrophil cytoplasmic antibody, cytoplasmic; CCPD, calcium pyrophosphate deposition; HCV, hepatitis C virus; Ig, immunoglobulin; P-ANCA, antineutrophil cytoplasmic antibody, perinuclear.
aEach of the 552 tests in the study had a result format specified in the Standard. For example, a pregnancy test had a binary result format. (The Standard is included in the Supplement; all supplemental materials can be found at American Journal of Clinical Pathology online.)
Open in new tab
The project launched throughout the VHA on July 5, 2016, with a weekly personalized email sent for 15 consecutive weeks to the manager of the laboratory information system (LIS), the laboratory manager, and the laboratory director at each VHA facility. Each email included a copy of the Standard and a report highlighting unstandardized results from the facility. The report, delivered each Monday morning, summarized 1 week’s results from Monday to Sunday 2 weeks before the date they were sent. The report divided the facility’s results into 3 categories: standardized, unstandardized, and “not applicable” for uncommon tests not reviewed Figure 1. Standardized results met the result format specified in the Standard. Unstandardized results did not meet the Standard’s result format. To simplify investigation of the unstandardized results, the report contained the reported result, a count of its occurrence, the test name, and associated specimen accession numbers. Uncommon tests that had no format specified in the Standard were not reviewed and thus were excluded from the study. Throughout the intervention, facilities with high numbers of unstandardized results received additional attention in the form of individualized emails, phone calls, or both. Software written to review the laboratory results and communicate by email was derived from the author’s previous open-source work.9 The data originated from the VHA’s corporate data warehouse, and all sites used the VistA LIS.
Report to facilities, with feedback on unstandardized results. Each of the 130 facilities enrolled in the study received a weekly personalized email with a similar report attached. The facility could use the test code (second column from the left) to review the desired format in the Standard. (The Standard is included in the Supplement; all supplemental materials can be found at American Journal of Clinical Pathology online.)
Report to facilities, with feedback on unstandardized results. Each of the 130 facilities enrolled in the study received a weekly personalized email with a similar report attached. The facility could use the test code (second column from the left) to review the desired format in the Standard. (The Standard is included in the Supplement; all supplemental materials can be found at American Journal of Clinical Pathology online.)
Implementation Evaluation
RESULTS
The Result Standard
The completed Standard (see Supplement) contains the preferred reporting format for each results category and subcategory (eg, binary) and lists the acceptable results category or categories for 552 of the most frequently performed laboratory tests (listed by both name and LOINC code). For example, the Standard result format for a urine pregnancy test (LOINC 2106-3) is binary, and section 2.4.1 (“Binary”) contains the Standard’s binary result formats. Together, the included tests accounted for 94.2% to 94.6% of total test results each week (average, 94.4%).
The result formats can be divided into 5 major categories: numeric, binary, ordinal, nominal, and general Table 1. Most of the major categories contain subcategories. For example, the numeric result standard contains plain numbers along with formats to report a numeric range, titer, or viral load. For each subcategory, the standard contains a description and examples of preferred, acceptable, and discouraged results. Preferred results for the plain number format include “1” and “>2.0,” where there is no space between the inequality and the number. Acceptable results, a less desirable format than preferred but equally unambiguous, allow a space between the inequality and a number such as “≤ 0.1” or thousand separators, as in “1,234,567.” Discouraged results do not meet the standard. They include inequalities to the right side of the number (“1.0<”), numbers expressed as words (“1 million”), and numbers with more than 12 digits to the left of the decimal. Very large numbers often represent data entry errors and can cause a computer error known as an overflow exception. Many of these examples originated from specific cases and feedback provided by a facility enrolled in the study.
Binary results include synonyms for positive, negative, and indeterminate results Table 1 (Supplement section 2.4.1). “Positive” results allow terms such as present, abnormal, detected, confirmed, immune, and reactive. “Negative” synonyms include confirmed negative, nonreactive, no growth, absent, and no. “Indeterminate” results typically occur near the cutoff value between positive and negative results and include terms such as inconclusive, equivocal, borderline, and weak. The variety of binary scales led to the inclusion of a variety of acceptable formats. For example, a test for immunity (eg, hepatitis B surface antibody titer) has a certain binary interpretation (immune or not immune), while a urine pregnancy test has another (positive or negative). Discouraged binary results include a single letter (“N,” “P”) or ambiguous abbreviation, such as “NR,” which different labs use to mean not reported and nonreactive.
Ordinal results consist of an ordered series of categorical responses, such as the preferred format of 1+, 2+, 3+, and 4+ Table 1 (Supplement section 2.4.2). Laboratories also used descriptive rather than numeric ordinal scales with words such as none, few, occasional, moderate, and many. Identical results on either the numeric or descriptive ordinal scales did not necessarily carry the same meaning among clinical laboratories. For example, some labs indicated the maximum ordinal category as 3+, while others used 4+ for the same test (eg, urinalysis). When mapped from a numbered ordinal scale to a descriptive ordinal scale, 3+ had a meaning of either “moderate” or “many,” depending on the facility. Similarly, the word slight appears as either the first (slight, 1+, 2+, 3+, 4+) or second ordinal category (few, slight, moderate, many). The Standard considered “many” acceptable because it had an unambiguous interpretation as the highest position in the ordinal scale. Other potentially ambiguous terms, such as “++” for 2+ and the result +1, were discouraged because many statistical programs (eg, Microsoft Excel) remove the plus sign and interpret the result as a number.
In contrast to ordinal results, nominal results are unordered Table 1. These results include lists of colors, ABO blood groups, and hepatitis C genotypes, among other test-specific lists. Without the guidance of a standard, laboratories developed difficult-to-compare, idiosyncratic terms. The problem became most obvious with colors used to describe urine that included colorless, light, pale straw, pale yellow, straw, and yellow. Based on feedback from laboratories, the Standard consolidated this example list from 6 items to 3: colorless, pale straw, and yellow.
“General” results describe the testing process rather than the test result and can apply to any test regardless of its numeric or categorical result form Table 1. These results cluster around 2 primary subcategories: “Test Performed” and “Test Not Performed”. Test Performed represents a valid result that may require an image or more characters than the result field allowed. Examples of the Test Performed subcategory include “see comment,” “see scanned report,” and “test performed.” In contrast, “Test Not Performed” means that a valid result does not exist for the requested test. Examples include “patient refused,” “specimen not received,” “quantity not sufficient,” and “specimen hemolyzed.”
The complete Standard contains additional result types, along with more detailed result type definitions and examples. It also links each specific test to its result types (see Supplement).
Adoption of the Result Standard
Facilities commonly asked why we labeled a result as unstandardized. For example, 1 facility was surprised to learn that the phrase “DNR,” which it used to mean “did not report,” had at least 2 other plausible medical interpretations: “did not react” and “do not resuscitate.” As an education technique, we drew an analogy to error-prone abbreviations in medication prescriptions.10 A second clinical laboratory inquired about a urine test result for morphine by gas chromatography-mass spectrometry, reported as “250 POS.” LOINC, a standard for laboratory tests, specifies different codes for numeric and binary results (19593-3 vs 19322-7).11,12 When both numeric (“250”) and a binary (“POS”) formats exist, the choice of LOINC code becomes ambiguous. A third clinical laboratory inquired about urine crystals listed as unstandardized, which led us to improve the standard with additional crystal types.
Implementation Evaluation
A trend towards standardization was observed throughout the intervention. Overall, the study reviewed more than 156 million results over 15 weeks, with a weekly average (range) result review of 10.4 million (9.1 million-11.1 million). The percentage of unstandardized results across all facilities declined throughout the course of the study by 51%, from 0.144% to 0.070% (β = −4.9 × 10−3, t(13) = −10.32; P < .001) Figure 2. A decline in the unstandardized rate by 0.074% of 10.4 million results represents an improvement of approximately 7,700 results per week. At the same time, the number of facilities with σ greater than 5 increased 58%, from 52 to 82 facilities (β = 1.79, t(13) = 9.32; P < .001) Figure 3.
Facilities at 5 σ or above by week of intervention. Number of facilities out of 130 participating facilities at ≥5 σ (≤23 unstandardized results per 100,000 test results). The line represents a linear regression.
Percentage of unstandardized test results by week of intervention. Each point represents the weekly percentage of unstandardized results across all facilities. The line represents a linear regression.
Most of the unstandardized results at the conclusion of the intervention originated from relatively few facilities Figure 4. In fact, 2 facilities accounted for 27% of the unstandardized results for the entire VHA, despite representing just 1.54% (2/130) of total facilities and 1.79% (171 × 103/9.56 × 106) of total results reviewed. These 2 facilities showed no improvement over the course of the intervention, with a σ of 3 at the study start and end. In total, 43 facilities, accounting for 33% (43/130) of the total facilities and 37% (3.54 × 106/9.56 × 106) of total results, created a disproportionate 95% of the unstandardized results.
Few facilities contributed many unstandardized results (Pareto principle). Six of 130 facilities contributed >50% of unstandardized results. The plot represents the last week of the intervention, week 15.
The unstandardized results from the first and last weeks of the intervention were classified into 10 mutually exclusive groups to better understand trends in standardization Table 2. The results categorized in the Units in Result group had units in the results field instead of in the designated unit field. The Nonstandard Abbreviation group included abbreviations such as DNR for “did not report.” The Nonstandard Ordinal Variants group included terms that could be confused with numeric results, such as the result of 3+ written as either “3” or “+3” and other variants, such as “+++.” The Extraneous Spaces group contained results such as “20.0,” which may cause a statistical software program to interpret a numerical result as categorical. The Unintelligible Result group included seemingly arbitrary symbols, such as “—-.” The Results in the Wrong Result Type group often included results that were standard for another test but did not make sense for the test under which they were reported. The Multiple Result Types group combined different formats, most commonly numeric and binary. For example, a result of “250 POS” should be reported as either numeric (“250”) or binary (“POS”), not both. The Format of Inequality group featured various nonstandard formats for the placement of the inequality, such as “< OR = 0.90” instead of “≤0.90.” A separate group existed for null results, where a result was not present. The final group was Leading Zero, such as “01.15.”
Classification of Nonstandard Results Before and After the Interventiona
Classification | Example | Week 1 | Week 15 | Change, % |
---|---|---|---|---|
Units in Result | <1 mg/dL, 98.6F, <1/HPF | 3,911 | 1,071 | −73 |
Nonstandard Abbreviation | DNR, TND, NOPER, HVY, pale | 2,886 | 2,022 | −30 |
Nonstandard Ordinal Variants | +++, +3, 3 | 2,339 | 1,369 | −41 |
Extraneous Spaces | “20.0” | 1,614 | 936 | −42 |
Unintelligible Result | ., **,///////, <>, LOADED, sent | 989 | 717 | −28 |
Wrong Result Type | “890” for a binary result | 533 | 82 | −85 |
Multiple Result Types (ie, ordinal and numeric) | >100POS, *POS1151.3 | 468 | 393 | −16 |
Format of Inequality | 100+, <OR = 0.90 | 189 | 14 | −93 |
NULL Result | (null) | 48 | 29 | −40 |
Leading Zero | 01.15, 07.80 | 41 | 26 | −37 |
Classification | Example | Week 1 | Week 15 | Change, % |
---|---|---|---|---|
Units in Result | <1 mg/dL, 98.6F, <1/HPF | 3,911 | 1,071 | −73 |
Nonstandard Abbreviation | DNR, TND, NOPER, HVY, pale | 2,886 | 2,022 | −30 |
Nonstandard Ordinal Variants | +++, +3, 3 | 2,339 | 1,369 | −41 |
Extraneous Spaces | “20.0” | 1,614 | 936 | −42 |
Unintelligible Result | ., **,///////, <>, LOADED, sent | 989 | 717 | −28 |
Wrong Result Type | “890” for a binary result | 533 | 82 | −85 |
Multiple Result Types (ie, ordinal and numeric) | >100POS, *POS1151.3 | 468 | 393 | −16 |
Format of Inequality | 100+, <OR = 0.90 | 189 | 14 | −93 |
NULL Result | (null) | 48 | 29 | −40 |
Leading Zero | 01.15, 07.80 | 41 | 26 | −37 |
aUnstandardized results from weeks 1 and 15 were classified into the mutually exclusive groups listed in the first column. The second column shows examples of each type. The third and fourth columns count the unstandardized results by week. The last column shows the percentage change before and after the intervention.
Open in new tab
aUnstandardized results from weeks 1 and 15 were classified into the mutually exclusive groups listed in the first column. The second column shows examples of each type. The third and fourth columns count the unstandardized results by week. The last column shows the percentage change before and after the intervention.
Discussion
Summary of Findings
We partnered with the clinical laboratories of a large health care system to iteratively develop a standard result format. While providing them with feedback, we observed an improvement in the standardization of their laboratory test results. To provide the feedback, 156 million laboratory results from 130 facilities over 15 weeks were reviewed. Statistically significant declines in the percentage of unstandardized results and a statistically significant increase in the number of facilities at or above a 5 σ (≤23 unstandardized results per 100,000 test results) were observed. This finding demonstrates the feasibility of voluntary laboratory engagement to improve the quality of laboratory results. Other clinical laboratories in pursuit of result standardization may consider the result standard we developed.
Implementation Challenges
The unstandardized test results observed in this study originated from an unstandardized process. Before any test results enter the system, the test setup must be configured in the LIS. Test configuration primarily determines the allowed results for a given test. Configuration options include unrestricted text or selection from a constrained list. The VHA does not have test-specific standard operating procedures for test configuration. Consequently, clinical laboratories routinely differ in the permitted results for the same test. There was marked heterogeneity in technical expertise to change test configuration across facilities. Facilities with an LIS expert often made same-day fixes to improve unstandardized results. Alternatively, a facility without the expertise could not change a test configuration issue in the 15 weeks of the study.
Manual data entry and interfaces with external reference laboratories represent additional sources of unstandardized results. For tests configured to allow free text, manual data entry inevitably introduces typographical errors. Without changes to the process of data generation (eg, removal of manual data entry, choices constrained to a list, real-time validation), laboratories likely cannot achieve the highest level of quality.13 Large external reference laboratories also provided unstandardized data to the VHA (“send-out” tests), which cannot be modified. This finding argues in favor of a national laboratory result format.
Deployment of a laboratory result standard in a large health care system faced many challenges. Laboratories had a reluctance to change their existing test results because of the additional work, perceived familiarity of current results among users, and the dependence of reflex orders on an existing test result format. Lab changes were intrinsically motivated by a willingness to harmonize their results rather than through a central mandate. Thus, our process was collaborative leading to a bidirectional feedback process, shared development of standard operating procedures to fix errors, and new relationships to raise awareness of unstandardized result formats with reference laboratories. We also accepted requests from laboratories, such as accommodating the unambiguous result “Pos” in place of the proposed standard term “Positive.” The standard therefore designates preferred and acceptable results, such as “Positive” and “Pos,” respectively. A small number of laboratories showed no motivation to participate because of staffing vacancies or because the project lacked an employer mandate.
Future Work and Limitations
The adoption of laboratory result formats on a national or international scale will require the support of many stakeholders, such as professional societies, developers of standards, equipment manufacturers, and LIS vendors. Laboratory professional societies with national or international scope, such as the College of American Pathologists (CAP), could produce a standardized laboratory result format. The CAP could enforce a standardized results format either on submission of its proficiency tests or on laboratory accreditation. Professional organizations would need expert opinions in the many disciplines of laboratory medicine (eg, chemistry, hematology, microbiology). The continued development of LOINC to include more guidance on test results formatting represents another potential solution. Manufacturers of laboratory tests and equipment have an important role in determining result formats. One popular manufacturer of a urinalysis analyzer denotes “greater than 100” with the result “100+.” while nearly all other tests and laboratory equipment manufacturers use the format “>100.” Another manufacturer of a point-of-care blood gas instrument includes the units in the result by default, “98.6 F,” instead of the separate unit field. Standardized genetic reports, which were not included because of their low volume, may be included in a future project. The FDA could adopt a standard for laboratory test results for product reviews and incorporate it into test manufacturer recommendations. The absence of a suitable standardized result format coupled with the need for unambiguously interpretable results for effective patient care and research should amplify the urgency of further laboratory result standards development.
A limitation of the current study is that the authors were not formally trained in interventional theories, models, or frameworks.14 Improved outcomes may be possible with these skills.
A standard result format was developed and implemented over 15 weeks in a large health care system. This study demonstrates that the standardization of laboratory results can occur in a relative short time, given agreement in the laboratory community on an appropriate standard result format.
Conclusions
Funding: This material is the result of work supported with resources and the use of facilities at the Veterans Affairs Connecticut Healthcare System. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Department of Veterans Affairs.
We acknowledge the assistance of Gary Stack, Joseph Erdos, Michael Icardi, and Jack Bates as well as the active participation and constructive feedback of many VA laboratory stations in this quality improvement initiative.
Acknowledgments
We acknowledge the assistance of Gary Stack, Joseph Erdos, Michael Icardi, and Jack Bates as well as the active participation and constructive feedback of many VA laboratory stations in this quality improvement initiative.