There has been a lot of buzz about the role real-world evidence (RWE) can play in regulatory decision-making for health care. Recent legislation and growing experience using real-world data sets have led to a desire to better understand appropriate uses of observational data to generate evidence, particularly in the area of drug development and clinical trials.
OptumLabs Chief Scientific Officer Bill Crown recently connected with OptumLabs partner David Kent, director of the Predictive Analytics and Comparative Effectiveness (PACE) Center at Tufts Medical Center and the Clinical and Translational Science Graduate Program at Tufts University, to share perspectives on use of RWE based on their extensive experience working with data from clinical trials and real-world data from the OptumLabs Data Warehouse.
The RWE landscape
Bill Crown (BC): As you know, the 21st Century Cures Act and Prescription Drug User Fee Act (PDUFA VI) both included mandates for the FDA to develop a framework to include RWE in new indications for previously approved drugs and safety surveillance.
The FDA has been in the safety surveillance space with RWE for a long time with Sentinel, but looking at new drug indications is something new. What are your thoughts on the importance of using RWE in this area?
I think that real-world evidence, broadly defined, will undoubtedly become increasingly important for all sorts of things, not just for new indications. The limitations of real-world data now lie mainly in the quality of the data and the absence of randomization.– David Kent, MD, Tufts Medical Center
David Kent (DK): I think more and more what you’ll see are study designs that try to address both of these limitations through better causal inference methods (and in particular understanding the contexts in which trustworthy causal inferences can be made from observational data) and by incorporating randomization in routinely collected clinically integrated research designs.
These approaches will also address some of the limitations of traditional randomized clinical trials (RCTs), which take a lot of time, are expensive, logistically complex, and non-generalizable.
BC: Traditionally the Center for Drug Evaluation Research (CDER) has been definitively in the clinical trials camp usually requiring evidence from two well-controlled trials for drug approvals. But on the device side there has been more use of real-world evidence, and for rare conditions there has been use of real-world evidence for drug approvals.
DK: I agree. What we’ve seen in the past is observational studies often tend to be used where randomization is difficult. So you see it in rare diseases, where it is very difficult to enroll and randomize a sufficient number of patients to get an estimated causal effect.
You also see it in device studies ― a recent example occurred with trans-catheter aortic valve replacement, where it was initially approved for high-risk patients who couldn’t get valve replacement, and then subsequently for intermediate risk patients.
Both these indications were supported by randomized trials. But then the FDA required a registry to be developed alongside clinical use and that registry was subsequently used to extend the indication to valve-in-valve procedures for failed bio-prosthetic valves, without a randomized trial. I think that is a model that will be continued in the future.
BC: The question is whether they will not just extend those models to other patient populations within the same indication. It’s also whether there will be a new model where observational studies of the kind we are now doing in OptumLabs will be used for the approval of new indications.
DK: I’m optimistic that is going to happen. As we continue to replicate clinical trials in an observational database to see if we can come up with concordant results, we’re learning about where these designs tend to work. They seem to work best where there are two active and similar comparators and where you know the time zero for both the control and the treatment arm.
It is more difficult to emulate placebo control trials, and it is more difficult to emulate any type of trial that is a treatment strategy, such as a treat-to-target strategy that is commonly used in hypertension or diabetes or weight loss strategies, because it is really difficult with observational data to get at the intended target rather than just the achieved target -- which are very different things.
Benefits of RWE for sub-populations and person level effects
BC: One area I know you are studying is heterogeneity of treatment effects (HTE). In some ways, this issue of new indications is almost a heterogeneity issue. It involves heterogeneity of population in terms of a new treatment group and it may include different kinds of patients than the patients who were studied in the trials for the original indication.
Randomized trials measure an average treatment effect over a whole group and the ability to do sub-analyses on sub-populations is clunky. You have to redesign the trial and do it on another group, because the trial was not originally randomized to those sub-groups.
One of the benefits of observational analyses could be that if you felt like you could trust the results, you’d have the ability to redefine the treatment groups in the study populations and get an answer very quickly, as opposed to waiting for the data to roll in over the prospective trial data collection windows ― effectively allowing you to look at heterogeneity of treatment effects. What do you think about that?
DK: That’s a good question. I come at heterogeneity of treatment effects a little differently than many people. I’m really interested in trying to predict the treatment effect in a given individual who comes into the office, which is very different from trying to estimate the average treatment effect in RCTs.
Many people are interested in how the average treatment effect differs in very large groups of patients — say males versus females, old versus young.
I’m interested in conditional average treatment effects using all the relevant patient characteristics simultaneously that will optimize treatment decisions in individuals. How does it work in you with all your characteristics that might be relevant for the risks and benefits of a particular treatment of interest? Those have been called predictive analyses of heterogeneity of treatment effect.
BC: This is the kind of work you have been doing with your current Patient-Centered Outcomes Research Institute (PCORI) grant?
DK: Exactly. The starting point with the work we’re doing is predictive analyses within randomized clinical trials because we think that is where we are more likely to get actionable results at this time — not only with individual trials, but with pooled randomized clinical trials so you get more power than a single trial.
You can combine four or five trials ― still have randomization ― and by combining trials you also get more statistical power and a better case mix. So I think this is probably the best data substrate for the types of analyses we do.
If you’re going to port these approaches into observational data then I think we really will need very robust analytic techniques to achieve balance across every strata being looked at ― it’s not just a matter of having balance overall. You can’t do propensity score matching and then divide your population up into different groups and expect to get balance.
Observational data clearly have some advantages for HTE analysis compared to conventional efficacy trials — namely the statistical power, heterogeneity and generalizability — but there is still more methods work to be done before we can understand what contribution non-experimental real-world data will make in this space.– David Kent, MD, Tufts Medical Center
BC: I’ve been in a lot of discussions in the last year or so about demonstrating that the result of a clinical trial for a particular agent in a particular area, like a cardiovascular drug for example, can be replicated using observational data.
The question is whether you can show that you can arrive at the same average treatment effect with the observational data as the original trial that was used for regulatory approval, and then if you loosen the inclusion and exclusion criteria and you look at the patient population that was actually treated, whether you’ll have more confidence in the estimates from that real-world population that is broader than the one that was studied in the trial.
That broader population is of course the one that everyone is really interested in. What do you think about that argument?
DK: A couple of things. As a treating physician I’m less interested in the average treatment effect in the broader population and more interested in the specific treatment effect of each individual — or as close as we can get to the person-level treatment effect.
There are some methods that can take an RCT and generalize it to a broader population, but it’s not really getting to the estimate that most clinicians are most interested in, although there are of course other stakeholders who are very interested in that broader estimate of treatment effect.
Obviously you’ll have more confidence extrapolating to a broader population when you can replicate the results in a population similar to the RCT. But what you really learn about your statistical model is that you failed to show that it is wrong. You have not shown that your causal model is correct and I don’t think we yet know the probability that you can use these methods to confidently extend the results to the broader population.
How RWE can complement clinical trials: CABANA and beyond
BC: Back in March, the Catheter ABlation vs. ANtiarrythmic Drug Therapy in Atrial Fibrillation (CABANA) clinical trial report came out in two publications in JAMA. The trial evaluated cardiac ablation vs. treatment with antiarrhythmic drug therapy for atrial fibrillation. Simultaneously there was a paper based on observational data analysis published in European Heart Journal by a team of researchers at Mayo Clinic, and I know you were involved with this as well.
What was interesting about this study was that it was done and completed with de-identified observational data from the OptumLabs Data Warehouse prior to the release of the clinical trial results. They didn’t have something to aim at ― they just knew the design of the trial and conducted an observational analysis concurrently.
The cohort that they had was 84 times the size of the trial, and of the patients that were in the observational data, about 74% of them would have been eligible for the CABANA trial.
Not only was the study result similar to the per protocol patients in CABANA for those who completed their course of therapy, but the Mayo investigators also simulated the crossover of patients who started with one therapy and subsequently got the other. And that was without knowing what the results of the trial were.– Bill Crown, PhD, OptumLabs
DK: It was a very unique study. And as you said, it was led by Mayo Clinic researchers, Peter Noseworthy and Xiaoxi Yao, and they were gracious enough to invite me along. It was really a brilliant idea. They probably weren’t the first to have the idea of doing a concurrent observational study with a clinical trial. But in the end, I was surprised by how successful the study ended up being in terms of how closely the results of the observational study matched the results of the RCT. And obviously when you do it concurrently you are blinded to the results of the RCT.
With an observational study, as you know, when you are doing the analyses, there are just so many choices in terms of what confounders to put in your model, how to code the data, what interactions to include, what variables to leave out. These different choices are often examined in sensitivity or stability analyses, where one reports how influential different modeling choices are for the effect estimate.
Although you should be blinded to the outcome, when replicating prior trials, more typically researchers can look at the trial outcomes and examine whether they produce trial-discordant or trial-concordant findings. This is typically not done in bad faith, but one will almost certainly — for example — more vigorously check for errors when results are very discordant or surprising than when results are concordant.
Or it may affect the reasoning about which of the several different approaches to privilege in the framing of the paper. We did not have the luxury to do that, since we didn’t know what the "right" answer was going to be.
BC: OptumLabs has a project underway that involves replicating clinical trials with real-world data to learn more about the role observational data can play. It’s called OPERAND (Observational Patient Evidence for Regulatory Approval and uNderstanding Disease) and we’re collaborating with the Multi Regional Clinical Trial Center at Brigham and Women’s Hospital.
Under this project, two different academic institutions, Brown University and Harvard Pilgrim Health Care Institute, are going to replicate the same two clinical trials ― the ROCKET A-fib trial and LEAD 2 diabetes trial ― completely independently. The thought is to give them targeted information about what the design was, what the inclusion/exclusion criteria were, and what the endpoints were, etc. as reported in the pivotal publications reporting out the results of the trials. Then they will look at sources of variation.
They will attempt to replicate the average treatment effect of the trials using a whole series of different statistical methods and different combinations of claims and clinical data sets. The research teams will document their decision-making so we can understand how they interpreted the inclusion/exclusion criteria, the procedure and diagnosis codes they use, and so forth. They may measure the endpoints slightly differently, when they make decisions about what variables to put in the models and they may have different control variables.
These trials both involve cardiometabolic conditions where we have some experience using observational data. So I think it is likely that they will be in the ballpark of the trials, but there will be variability there. The thought is that this work can help in the policy-setting discussions around using real-world evidence to understand what the sources of variability are in these two very specific use cases. What are your thoughts on the overall study design?
DK: It is a very unique and interesting design. And in a way it is kind of a novel sensitivity analysis. With any observational comparative effectiveness study, you should be doing sensitivity analyses to test the stability of the estimates to all your assumptions and design choices.
Your project adds another layer to that in that you have two different and superb teams looking at the same question. You definitely should learn a lot from this design.
BC: Another interesting aspect is that OPERAND will be using electronic medical records data and that brings in the clinical information. This is an area where the FDA is very interested now. To date the safety endpoints they have been looking at are in claims data alone.
The wide-scale availability now of electronic health record data is opening up the potential for looking at clinical outcomes for things like hemoglobin A1c. Using this data creates a whole other set of issues based on the challenges of working with EHR data, including that it is generally incomplete and you don’t know how incomplete it is until you link it with claims data and get a sense of what you are missing. So there is a lot of value to learning how to work with these integrated data sets.
Data quality and opportunities in risk prediction
DK: There is a lot of room for improvement in both claims and EHR data to make them better for research. And for the EHR, anything that can be done to make that data better does so not only for researchers but also for clinical care.
I think what we’re seeing is that the goals of clinical care and the goals of research are converging. Because the information that researchers need to do a good study aligns with the information doctors really need to take care of their patients ― except for randomization and human subject consent.– David Kent, MD, Tufts Medical Center
One of the things that we don’t get as well as we should in clinical care that is absolutely essential for good studies is excellent outcome ascertainment. But that clearly needs to be improved even just for the clinical enterprise.
That was one reason I was so surprised at our emulation of the CABANA trial. This was not a study that took place over several months ― it took place over five years, yet the outcome ascertainment seemed to be pretty good, at least for the outcomes in that trial.
BC: The clinical data can be useful not only for research studies, but also for prediction and risk stratification in a clinical setting.
DK: Very useful. Since the data that you have is really the same data that the electronic health record has available, if you create a risk model using OptumLabs data you could automatically populate the EHR with the same model by pulling the same data and getting similar predictions.
We are currently doing a project with AMGA, whom we met through OptumLabs. This is a PCORI-sponsored project where we stratified patients based on their personal risk of diabetes, and target diabetes-prevention efforts like initiating treatment with metformin or enrollment in the Diabetes Prevention Program (DPP).
Newer technologies are making it much easier to get automatic calculations as part of the EHR and have them be interoperable between different systems.
BC: This is an area where machine learning could be potentially very helpful from a prediction classification standpoint, and particularly across very complicated data structures. Traditional methods require complete information on everybody for everything, but machine- learning methods can handle bits of information here and there.
As a clinician, how much do you worry about the "black box" aspect of machine learning in terms of predictions vs. wanting to really know what it is that is driving that?
DK: Undoubtedly, there are certain tasks that can be better approached by the suite of prediction methods identified as “machine learning” techniques. For example, convolutional neural networks ― a type of deep learning ― clearly perform much better for image analysis tasks then conventional statistical approaches.
Nevertheless, for many tasks, such as risk prediction for many clinical outcomes, classical regression-based statistical methods typically are generally as good or almost as good as more complex, non-linear models. And there are also ways to accommodate data missingness when using regression based models for prediction.
My sense is that for many tasks — especially tasks that prioritize patients for health care service or resources — model transparency will be especially important, since fairness will be a central concern when applying these models; whatever improved prediction performance black-box methods might offer might not be worth the trade-off in terms of the sacrifice in the transparency of the predictions and decisions. To be sure, the specifics of the context will matter greatly in determining the relative importance of these trade-offs.
What lies ahead for use of RWE?
BC: We’ve covered a lot of interesting angles around RWE in the course of our conversation. To close, I’d love to have you provide your thoughts on where you see the biggest priorities in maximizing the value of RWE in the next five years.
The first priority should be improving data quality. The reliability of data collected in routine clinical care and to optimize billing has serious limitations in terms of how they encode the “real world” that need to be better understood and addressed.– David Kent, MD, Tufts Medical Center
DK: Along the same lines, increasing the throughput of applied and methods work on real-world data will give us a better sense of the trustworthiness of inferences that can be produced with this data. Efforts like those organized around the OptumLabs Data Warehouse and the Observational Health Data Sciences and Informatics (OHDSI) collaborative are central to this very important project.
Tag: Articles, Social, health system and policy, Articles