Correcting for bias in the selection and validation of informative diagnostic tests.

03 Jul 2018

When developing a new diagnostic test for a disease, there are often multiple candidate classifiers to choose from, and it is unclear if any will offer an improvement in performance compared with current technology. A two-stage design can be used to select a promising classifier (if one exists) in stage one for definitive validation in stage two. However, estimating the true properties of the chosen classifier is complicated by the first stage selection rules. In particular, the usual maximum likelihood estimator (MLE) that combines data from both stages will be biased high. Consequently, confidence intervals and p-values flowing from the MLE will also be incorrect. Building on the results of Pepe et al. (SIM 28:762-779), we derive the most efficient conditionally unbiased estimator and exact confidence intervals for a classifier's sensitivity in a two-stage design with arbitrary selection rules; the condition being that the trial proceeds to the validation stage. We apply our estimation strategy to data from a recent family history screening tool validation study by Walter et al. (BJGP 63:393-400) and are able to identify and successfully adjust for bias in the tool's estimated sensitivity to detect those at higher risk of breast cancer.