by David Lee, chief data officer at Medidata Solutions
We’ve all heard the saying, “Garbage in, garbage out.” Data quality – accuracy and completeness – is vital to the scientific return on the investment in clinical trials. Inaccurate data arise frequently from a variety of sources, including entry errors, insufficient training, subjective differences and even site misconduct. Predictive models can be utilized to identify anomalies in clinical data in near real time; these can then be systematically corrected, well before database lock. Further, automated error detection can alert sponsors to data problems, whether or not these are expected. Sponsors, CROs, and regulators alike will benefit from this new paradigm in data quality, which is improving the quality of clinical data and the resulting scientific findings.
Predictive modeling can be applied to operational decision making in clinical trials. For example, a sponsor can improve a trial’s performance by using predictive models for site performance, constructed from the results obtained by sites in recent trials and other predictive factors. By analyzing such historical data, a sponsor can select the sites that are best suited to the current trial’s protocol and therapeutic area. These sites can be identified by their predicted values on key attributes: high enrollment, short time to achieve it, high patient retention, low procedure costs, and so on. This site selection process enables the sponsor to build a more optimal, cost-efficient trial.
New data sources
Trial sponsors are constantly looking for ways to incorporate into their trials new data sources that improve our prognostic power. GSK recently became the first pharma company to run atrial using Apple’s iPhone platform ResearchKit.
Mobile health data from wearable devices and mobile apps can give a far more detailed, reliable picture of how patients are feeling and responding to treatment than patient-reported outcomes. We’re also gaining an increasingly sophisticated understanding of genomic data and their application directly to clinical development.
And let’s not forget the data that are already familiar and all around us. Electronic health records from health care providers can be included in clinical data analyses to improve basic understanding of patient populations. The vast amount of patient data already generated and available can inform and advance clinical trials.
These new data sources – mobile health data, genomics data, and EHR data – are only useful, and predictive modeling is only possible, if the data are managed properly. High frequency observations, generated dozens or even hundreds of times per second by mobile devices, can be overwhelming. Similarly, the challenge of learning anything meaningful from genomic data is to sift through a massive amount of noise to find the diagnostically and clinically valuable signals, the proverbial needles in the haystack. Our challenge here is not to overwhelm researchers with more data, but rather to locate and highlight for them the precise data they need to enable and facilitate discovery. This is a daunting task, but well worth the effort required to achieve success.