Clinical trials vs. RWE: Where are we now?
Insights from the OptumLabs Research & Translation Forum.
The changing landscape of real-world evidence
There has been great interest in real-world evidence (RWE) in the last few years, fueled by a surge in observational data, new analytic techniques to work with it, and legislation mandating that the FDA develop guidelines for using RWE to support regulatory decision-making.
Many think randomized control trials (RCTs) are the gold standard for evidence. Yet they are costly, time consuming and involve more targeted populations than exist in the real world. RWE can complement traditional RCTs to inform new drug indications and safety surveillance.
To prove that, there has been a lot of focus on better understanding when observational studies generate reliable results and when they don’t. Efforts to replicate RCTs with RWE have expanded, with the goal of generating evidence that informs when, how and why this works.
Where do we stand with reliably using RWE?
Learn what experts actively involved in helping inform RWE regulatory standards think is most important now about trial replication using observational data.
Speakers: Joseph Ross, MD, MHS, Professor of Medicine and Public Health, Yale Medical School; Sebastian Schneeweiss, MD, ScD, Chief of the Division of Pharmacoepidemiology, Department of Medicine, Brigham and Women’s Hospital and Professor of Medicine, Harvard Medical School; Bill Crown, PhD, Chief Scientific Officer, OptumLabs.
- I'm joined on stage by two fabulous people, Sebastian Schneeweiss, is Professor of Medicine at Harvard and Chief of Epidemiology and Pharmacoepidemiology at Brigham and Women's Hospital. And really kind of a world-renowned epidemiologist in pharmacoepi. And Joe Ross, Professor of Medicine and Public Health at Yale, who is also a world-renowned researcher, and more maybe on the clinical trial side of things than on observational data analysis side of things, but Joe has recently been coming into the observational world. And this creates some really nice kind of opportunities. One of the things I was just struck by in the conversations so far this morning, is that real-world evidence was kind of at the heart of the discussions of AI, and artificial intelligence, but in a different sort of way. Machine learning methods and artificial intelligence are traditionally about what economists call the Y-hat problem. So this is the idea of prediction and classification, that's the dependent variable. But for health services researchers and epidemiologists, and economists, we've been primarily worried about the B-hat problem. And the B-hat problem is, what's the effect of that coefficient, the effect of that intervention on some outcome of interest. So these two things are related to one another. But they're certainly not the same thing at all. In the last couple of years there's been this huge surge of interest in real-world evidence, and it's been for, I think, two reasons: one is this, kind of, rise these now analytic techniques, machine learning analytic techniques and their application to healthcare. But the other is some legislative changes, Prescription Drug User Fee, the Reauthorization VI, and then language into 21st Century Cures Act, that has required the FDA to by, 2021, to develop a framework for the incorporation of real-world evidence into regulatory decision making, new indications for previously approved drugs and for safety surveillance. And Sebastian's done tons of work in the safety surveillance side, and to speak to the fact that we've, FDA has been in this area for a long time. But you can be sure that the thing that the FDA is really interested in is, how can I feel confident about the evidence that comes out of these observational studies? And how close is it to the results that I would have gotten had I done a randomized clinical trial? So we're gonna get into all of that. But before we do, I'd like to just ask both Joe and Sebastian something that Darshak alluded to which was, what's the difference between, sort of, real-world data and real-world evidence? Because they're not really the same thing, are they? Or, are they? Sebastian what would your reaction to that be?
- So let me just go back to FDA's definition, the framework document that they released last December, where they define real-world data, as data generated by the operation of a healthcare system, and the data, the electronic data, that are routinely generated by operating such a healthcare system. And real-world evidence then they say, is the evidence on the effectiveness and safety of medical products - being the FDA, they have to focus on medical products, right? - in this routine care, clinical care setting, derived from the real-world data. So really the two levels, real-world data, and real-world evidence. And in this same framework document they talk a lot about the real-world data being fit-for-purpose. How reliable are these data? How complete? How specific are these data are feeding into the analytics framework of a causal analysis to generate generalizable evidence on the effectiveness and safety of medication. Just linking this back to what Sachin had said was very interesting to me, 'cause he said, in health services research, you want to generate in healthcare delivery evidence that is locally derived. Because the delivery of healthcare is very local, obliviously, right. But when we talk about medical products, you would argue that this is a global enterprise, and you see with a large pharmaceutical industries, they are globally active, So it's about generalizable evidence on the effectiveness of safety of medical products.
- Okay, Joe?
- I think you're here to find that Sebastian and I agree a lot, although, let me just say in advance, if anyone has methodological questions, please direct them to Sebastian. I think it's, the context of real-world data, real-world evidence, it's the setting in which the data are generated. And it goes well beyond electronic health record data, claims data, it can be your wearable data, it can be lots of information that's collected in the setting of, you know, receiving care, and there's lots of ways to think about it. And we'll get into those details. The point that I want to make at the outset is, we need to embrace real-world data, real-world evidence as means of essentially better understanding these medical products. For people who aren't aware, lots of products come to market with very limited evidence. I mean, maybe a single clinical trial, maybe two, limited number of patients studied, usually focused on surrogate marker of disease, they're small, and there's a lot of uncertainty at the time of approval, even though they're coming through on the basis of trials. Real-world data, gives us an opportunity as it becomes more dynamic, as we're able to take advantage of more methods to complement our understanding around the trials, it helps to sort of round out the picture and fill in the lines a little bit.
- Great. So I heard a couple of different things there. One, I think our natural inclination in the room would be, because we're data people, would be to think about real-world evidence solely in the context of analyses of data sets and analyses of observational data, retrospective databases from claims and electronic health record data. But real-world evidence is actually more about sort of the system of care. And the idea that we could easily have randomized studies where people are randomized to different treatments and then they're just followed in the healthcare system, and those are real-world evidence studies. And that's certainly a lot of what the FDA is thinking about. But for us, I think today, we'll kind of back away from that a little bit and we'll focus mainly on these kind of database-related studies and, what are the advantages and disadvantages of those? So here at OptumLabs, when the 21st Century Cures Legislation was passed and PDUFA VI Reauthorization, the first thing we thought about was, could we, kind of, bring together a collaboration in this area and do some work? Because, truly there's gonna be a lot of interest from regulators an from life sciences companies and so forth in this legislation. And even though our interest in this topic, meaning the research community and OptumLabs, is very, very broad, because having reliable treatment effect estimates is something that is very general, whether it's a surgical intervention, or whether it's pharmacoepidemiology study, or it's a benefit design, these are all treatment effect studies and they all are observational data, and they're all the same statistical problem. But, we seized on this thing opportunistically. We said, "Oh, there's gonna be a lot of interest in this, let's get people together." And so we did. And we have a project that's called, "OPERAND," that many of you are familiar with, where we have two different academic groups, Brown and the Harvard Pilgrim Health Care Institute. And each of them is replicating the same two trials, the ROCKET Atrial Fibrillation trial and the LEAD-2 diabetes trial. And the idea is to have them using different methods and then different data, using claims data first, and then claims plus clinical, but we're also keeping them separate. Because researches make different decisions, and they say, well what variables am I gonna put in my model? And what sort of statistical methods am I going to use? and so forth. And from the FDA's perspective, they're thinking, well, you know, the observational studies of the wild, wild west, and we see all these examples where sometimes observational studies seem to generate reliable results, and other times they don't. How much of that is due to methods? How much of it is due to data? How much of it is due to decision making of the researchers? But while we were sort of thinking about this at OptumLabs, and in conjunction with Multi-Regional Clinical Trials Center at Brigham and Women's, Sebastian was thinking about the same thing. And so could you talk us a little bit, Sebastian, about the efforts that you have at Harvard?
- Sure, sure, so first of all I think we all love randomized trials, so let's be totally clear. I have button actually that says, "I love RCTs." The real-world evidence has blossomed quite a bit, and because we have better methods now, we have better data, equally important, and FDA got very interested in this. But we pressed a pause button because of the pushback we received, and rightly so, I feel, that we do not really understand how well we're doing with a given real-world evidence study as compared to an RCT. Where when we embark on a RCT, and most of the decision makers will make substantial impactful decision, that as a regulator healthcare manager, based on RCTs, they have not done a RCT in their life ever, right? They were not involved in dong RCTs. Nevertheless, they trust RCTs because they feel they understand RCTs, they understand the causal question that is answered with this. How can we do better with real-world evidence to get to that point that we have a level of confidence that we can make these difficult decisions? And that triggered this RCT duplicate project, similar to the OPERAND project, and there are a bunch of other projects in Europe, where we identify this overlapping number of study questions that are answered with an RCT, that was submitted actually to the FDA for regulatory decision making, and answer that with real-world evidence. We focus on those RCTs where we think that the data, with the data that we have, these are claims data right now, we have a fighting chance, we really stacked the deck in our favor, and what we want to identify is how well can we identify the same point that we see in a randomized trial? We have to distinguish between the inability of emulating the randomized trial. And this is a very humbling experience, and in most cases you actually cannot even get close to what the trial was doing, versus those opportunities where we feel good we can emulate a trial, and the difference that we see is truly due to bias. And what we tried to learn from that, Bill you mentioned this already, the most important is not that, I don't know, we can correctly replicate 25 out 30 randomized trials. My mother would be proud of myself I'm sure, but that's not the point of this. The point is that we learn when it works and when it doesn't work, right? And if I may, just a quick example was published in JAMA just a month ago, it's not in the pharma world, it's the medical devices world, or actually surgery. Bariatric surgery was a big study, bariatric surgery, a non-randomized study that shows a 60% reduction in heart failure hospitalization, sustained over almost 10 years, for those undergoing bariatric surgery versus not. 60%. I don't know what's going on in this editorial board meetings? If you see the effect of 60% reduction, this stuff needs to be applied to everybody with the BMI of 30 plus, right? Anyways, so what is happening here, I'm making this point only to illustrate to you, this is not about analytics alone, this is about understanding how care is delivered, understanding how data are recorded and processed, and then how they are analyzed. You need to understand all of this, right? So if you go under a bariatric surgery, you get a medical work-up. You get a medical work-up and you get all the stuff recorded in your electronic health record system, and you get it recorded in the same system where the surgery is performed. Now the comparison group, not undergoing surgery, having much of the information recorded outside of that healthcare system, they have an information imbalance. And that information imbalance will cause, will amplify the confounding that is already ongoing there. How can we overcome that? It's actually fairly simple. What we did, we reanalyzed the data and we used knee replacement surgery as a comparison group. Patients undergoing knee replacement, right? It's an elective surgery, it is the same work-up that you have, you have the information balance now, and you can overcome confounding and the effect goes away. We need to be clear and help the regulators and the decision makers to differentiate between the valid real-world evidence and the misleading real-world evidence. And we have to do a better job providing the decision makers tools to quickly differentiate between the two of them.
- Joe, you have a comment on that?
- Yeah, I'll just add onto that, I think you know the concepts that Sebastian's talking about are really critical. The idea of falsification testing, using negative outcomes as controls within your real-world evidence. I do wanna make sure that we don't hold these types of studies to a higher standard, then we would clinical trials. In the sense that, you know, if you were to repeat a clinical trial, with the exact same population, exact same outcome, exact same intervention, how often would you get the same answer? I mean if you were asking observational research to replicate clinical trial research, would you get the same answer? And it think it's probably less often that many people in this room would believe. We know from the work of John Ioannidis and others, that you know, two thirds of the time it replicates, but that means a third of the time or so it does not. So why does that happen? What does that mean in terms of reliability of the evidence base? You made one other point about the, sort of, wild west of observational research which is true. But, you know, the clinical trials world also used to be the wild west, right. And then, you know, through the work of many people in D.C. and journal editors, now we have clinical trial registrations. Before a trial's ever started we know exactly what they've specified, what endpoints they're gonna study, what population they're looking at. We need to be doing the same for real-world evidence. We need to have the same concepts of preregistration, pre-specification before any study begins, particularly if it's going to be used for regulatory decision making.
- Let me just pick up on, so Sebastian made the point that were sort of interested in this clinical trials replication stuff, not so much because that in itself is a useful thing to do, but it's because it sort of helps us to understand, When is the data, and methods, and research design and so forth sufficient to get to the same conclusion?
- So in the replication work that our groups engage in, we're working with Mayo clinic team, working with OptumLabs data, and, you know, what we're particularly interested in is, how the population's changed. So when you apply the same inclusion/exclusion criteria that's listed in a trial, you apply it to the real-world population, do the patient populations even look the same? And we suspect that they're not. It's one thing to be able to replicate the trial results, but can you even replicate the trial population who are enrolling, and my guess is that you won't be able to.
- You recently were a coauthor on a paper that was published in JAMA Network Open that was on the feasibility, what percentage of, you know, clinical trials would it be possible to replicate? Several of your coauthors are in the audience, so could you talk a little bit about the motivation for that and what you found?
- Well this was like a fun little idea that we had a medial student pursue, which was the idea of, if you looked at the clinical trials that are published in the highest impact medial journals, and we looked at every RCT that was published in 2017, what proportion of them could feasibly be replicated? And, just very simply, we looked at, you know, could you ascertain the intervention, the enrollment criteria, the end points, that kind of information from an observational data source. And we were very generous in saying you could get it out of a claims data, you could get it out of electronic health record data, and even at that, it was only 15% of those trials could be replicable. And the two major reasons were, the drugs products that are being studied in this very high impact journals are not yet in wide use at the time, so you couldn't use observational data to do that work. Or it's studying a medical device, no matter how widely used, you can not identify a medical device in most electronic health records or claims data. So this is just to say, we have a long way to go before were gonna be substituting real-world data for medial product evaluations in terms of regulatory decision making, but that does not mean there's not great opportunities to think about the way that, sort of, complements our understanding from products once they're on the market.
- Now you both made a comment around research design and I wanna come back to it, there's a really great example in literature around the Nurses' Health Study, which was a massive observational, very well designed observational study. For 10 years, on the basis of the Nurses' Health Study, we had thought that hormone replacement therapy had a protective effect with respect to cardiovascular risk with women. And then came along the Women's Health Initiative, which was an equally large randomized clinical trial, and it came to the opposite conclusion. It said, no, actually it's not protective, that women that are treated with hormone replacement actually have a higher risk of cardiovascular disease. Immediately everybody said, oh, there's an example, that you can't trust observational studies, this was, you know, the difference between these two, they're both high quality studies, primary data collection, large samples of women. But they differ in terms of one's randomized and the other isn't, and the fundamental difference in this, kind of, conclusion around treatment. Sebastian do you have any comments that?
- I guess I do. So what is the most dissatisfying experience for myself, is if I see a randomized trial and non-randomized study that is very similar and gets very different findings. I don't know why I get different findings. I have a whole slide deck of these things where I usually can explain very well why they're different. And this is one of those cases where it's very clear what went wrong or what was studied differently, depending on where you sit, you might want to verbalize this differently. Because Miguel, a colleague of mine in public health was reanalyzing the Nurses' Health Study, as a new user study design rather than as a prevalent user study design, and he got exactly the same finding as the randomized trial, showing an increased risk in the first 24 months of hormone replacement therapy with regard to coronary heart disease. And we have plenty of these examples, which is enormously frustrating as an educator because I'm training people in our courses, right? That we have these mistakes that we see over and over published in literature, leading to this misleading findings. We know how to avoid them, we know how to avoid them. We need to a better job in educating the people using these types of data, because this data become ubiquitous. So OptumLabs is a fantastic data asset, and you can license these data, and you can purchase Medicare data and things like that. So more and more users will be using these data, and we're lagging behind in education of how to use these data.
- Yeah, if I could just push you on that a little bit. One of the... I do think that there's this perception that somehow these observational studies are just, sort of, they come out of the whim of the researcher in terms of how they're designed, and they're often very biased because you're trying to get a particular conclusion. And there's all sorts of nasty motivations that are ascribed to us as researches and giving different results from observational studies. But, we actually have some guiding tenets now about things to avoid, things to be thinking about when it comes to observational studies. I wonder, if you could sort of tick off what some of those are. 'Cause you've written some beautiful stuff on this, that, kind of, lays out what we should be thinking about and conducting, particularly pharmacoepidemiology studies.
- Without going into nitty-gritty details and all that, what is enormously helpful to think how you would structure a randomized trial if you had the money, the time, the ethics approval to do a randomized trial. You're not doing a randomized trail, you do a non-interventional study, but if you could do the trial, how would you design that. It's enormously informative in what kind of choices you make how to set up the study, and then you have the reality of your data, and you see, well, some of the things that you would like to do in a randomized study, you actually cannot do. I'm not talking about baseline randomization, 'cause really that you don't have, but we talk about measurement issues a lot, right. And again, that clarifies the difference between the ideal world and the reality of the data, and it's important that we're transparent about that. And then also convey this to our audience. What are the shortcomings? What are the compromises that we have made in order to work with this real-world data?
- Yeah, and this gets to the point you were making, Joe, earlier, about the transparency also, actually doing the hard thinking that it takes to think through that research design, write it down, write a protocol, post it and say, hey, this is what I'm going to do, and then people can judge what the results are.
- Again, I'll just say again, I don't think that observational research should be held to a different standard, right. We should be just as cautious when were interpreting the results of a trial, because trials have frequently been done sloppily, you know. I wear a sticker too, I love RCTs, right, but I don't - actually I need the button like that, but, you know, they have just as many potential problems as observational research. And one of the values of these kinds of real-world evaluations now is, you can take multiple different approaches, all pre-specified, to make sure, kind of pressure test your analyses. As an example, Sanka who's here, just presented an incredible set of analyses to the American Heart Association meeting when were trying to better understand a medical device that's used for mechanical circulatory support. High-dimensional propensity score analysis, the full sample and a sub-sample looking at specific groups, you know, then testing again with instrumental variable analysis, and each time you know, your magnitude of effect is holding up. That gives you confidence that the work is strong.
- I just wanna draw this out, 'cause I'm sure your guys have picked up on this, but there's variability in observational studies. You could do a study with, you know, Anthem data, BlueCross BlueShield data, and United Health Care data and CMS data, and so forth, and you'll get a distribution of estimates. You'll get a whole bunch of different estimates. And you could do the same thing with clinical trials. You could do clinical trials in different setting all around the country, different countries around the world and so forth, and you'll get a distribution of evidence. So at best the thing that you should expect is that on average those things are comparable with one another. But it's really, actually kind of, a stretch to think that any one observational study would be similar to a randomized clinical trial, unless they're both pretty tight and there isn't a lot of variation around that. But, this happens all the time. So, you, in this room, now have the benefit of Joe and Sebastian to be thinking about this in a much more sophisticated way than everybody else in the country. Think a little bit about, sort of, the clunkiness of trials, in the sense of, the fact that there's prospective data collection, there's design, there's IOB, there's all this stuff. Takes a long time to, kind of, set up and conduct clinical trials and do the primary data collection, to wait for the results to come in. That means that for any particular question you've got to replicate that whole process, and that's clunky, right. So one of the things that's potentially a benefit of observational studies is the ability to kind of redesign the question, and answer a lot of different questions more quickly. And I'm thinking particularity about issues around heterogeneity of treatment response and heterogeneity of how different patient groups may respond. I wonder if you could just briefly each summarize what your sense is on pros and cons of observational studies with maybe heterogeneity treatment effect being one of those things. And then we can open it up for questions to the audience.
- Well, the obvious advantage of non-interventional studies using this large data sources is that you have this very broad population to start with. Many of these patients are excluded from randomized trials in a very systematic way, elderly patients, pregnant women, women in childbearing age, just to name a couple of those. You also have these very large numbers which you need to study treatment-effect heterogeneity, right, because the subgroups are getting small, very, very quickly. And now if you start out with a 100,000 users of a specific drug, now you can have really a lot of different sub groups based on biomarker profiles based on genomics profiles, based on clinical profiles, severity markers, and things like that, which is highly informative. But this all hinges on the level of confidence that we have, that we're really answering causal questions. We can do a lot of data gymnastics, but we have to convince the end users of this information. That this is a valid finding in the end.
- I'll just tack on very briefly, which is just to say, It's never a quick as you think, when you say, like, but we can quickly , but one of the values I see is that you can take observational data sources from multiple different systems, and that allows you to create complementary understanding, where if you get the same result in Kaiser as you're getting in the labs, as you're getting in the VA, you're feeling good. Great.
- Thank you. So we have design, we have methods, and we have data, and the greatest of these is design . Thank you very much Joe and Sebastian.
- Thank you Bill.