When establishing precision of a method, we are interested in the methods ability to produce consistent results. Precision measures the closeness of agreement between measured values obtained by replicate measurements on the same or similar objects under specified conditions.
What does it mean?
There is practically always some variation in measured results compared to real values. It consists of systematic error (bias) and random error. Precision measures random error. In ANOVA protocol, it is divided into three components
- Within-run precision (or repeatability) measures result variation in a situation where replicated samples are measured under identical conditions. This is the lowest imprecision the assay can achieve in routine practice for a given concentration. Variation in results of a single run is basically caused by random things happening inside the instrument, such as variation of pipetted volumes of sample and reagent.
- Between-run precision measures result variation happening between different runs (e.g. between morning and afternoon) of the same measurement setup. Variation of results between runs may be caused e.g. by changes in operating conditions. For example, the instrument may be warmer in the afternoon than in the morning or the measurement may be done by a different person.
- Between-day precision measures result variation happening between days. Results may vary between days as a result of calibrations, changes in humidity etc.
From these three components, you can obtain an estimate for within-laboratory precision that describes the measurement precision under usual operating conditions. The within-laboratory precision corresponds roughly to what can be estimated from a series of internal QC measurements.
If some of the above precision components dominates the results, you may be able to trace the reason behind, and sometimes even make it better. For this the Levey-Jennings chart can be a valuable tool as it visualizes trends in the data set.
On the right you can see example data sets of three fictional ANOVA protocols lasting for four weeks. First one shows poor within-run precision, second one has differences between morning and afternoon series (poor between-run precision), and third one has results falling from day to day during the work week and getting up again after weekend (poor between-day precision).
These are sort of extreme cases to show you what the graph can tell you. In real life, the effects are less evident, but if you know the instruments working cycles and what kind of samples are run on which days and who are performing which runs etc., you can review the graphs to find out whether these things have visible effect or not.
How do I measure precision?
For validation purposes, CLSI recommends doing measurements on 20 days, two series each day. You don’t need many samples, but you need two replicates for each series (EP05-A3). For verification, a lighter procedure is enough, five replicates over five days with one daily series would do (EP15-A3).
It may feel surprising that you can get good results of within-run precision with only a couple of samples and replicates. The key is that you should use samples with relevant analyte concentrations. When establishing or verifying precision of a qualitative method, the sample should be near limit of detection (LoD). For quantitative methods, you should design levels so that you can evaluate precision on high and low concentrations and near the medical decision point, which means that three samples is often enough.
When conducting a 20 days ANOVA validation study with two daily series and two replicates, you get 40 replicate pairs of each sample that can be used for calculating within-run precision. Splitting measurements to many days and runs reflects usual operating parameters better, giving more credibility for the value obtained as within-run precision. In addition, you have 40 runs to be used in evaluating between-run precision, and results from 20 days to be used in evaluating between-day precision.
Why do everything in one study?
When you use the same measurement results for obtaining values for each component of precision, all these results are affected by same randomness. This makes these components statistically independent of each other. This means that you can compare them, and it is also possible to calculate the within-laboratory precision from them as sum of squares.
It’s kind of like playing darts and trying to find out how well you are doing. Determining different precision components on different measurement setups would be a little bit like having a dartboard with straight stripes so that you would determine the position of a dart only in one dimension. If you would turn the dartboard between game rounds, you would get information about both x and y values of the coordinates, but there would be no meaningful way to combine this information.
So similarly as you rather play darts using a proper dartboard, please establish your precision with a proper ANOVA protocol.