Brennan here – Tim is graciously allowing me to guest post.
On the back of our recent paper on estimands in the BMJ, Tim wrote about some common criticisms of the ICH E9(R1) addendum. One common concern, which I want to address here, is the view that the addendum ignores causal inference.
This point frequently comes up in conversations I have, often from people who feel irritated (for lack of a better term) by this sleight.
I understand – if someone wrote something in my research area and ignored all my work, I’d be annoyed too.
But I think much of this criticism actually stems from a misunderstanding of what the addendum is, and, just as importantly, what it isn’t.
So what is the ICH E9(R1) addendum?
The addendum is a regulatory document put out by ICH (broadly, a group of regulators and pharma representatives) that gives guidance to pharmaceutical companies (i.e. tells them what to do) for their regulatory submissions.
Specifically, it tells them how to describe their estimands (in a way that regulators can understand), and tells them to use the “estimands framework” (i.e. to define their estimand, use a statistical method which addresses that estimand, and evaluate robustness around the estimator’s assumptions).
All pretty standard stuff – so what are people’s concerns with it?
Some common criticisms
One of the main criticisms I’ve heard of the addendum is that it ignores causal inference – including the literature (it does not cite any), the terminology, and key concepts like time-varying confounding.
A final criticism I’ve heard is one that’s usually implied, though sometimes is stated outright – that what’s in the addendum has been known for years, so why are its proponents acting as if they’ve discovered something new?
I’ll discuss each of these issues in turn.
The addendum ignores causal inference
This is an interesting one, because the addendum clearly doesn’t ignore causal inference – it’s almost entirely centred around causal inference ideas.
For instance, the addendum describes estimands as a contrast between potential outcomes (“… how the outcome of treatment compares to what would have happened to the same subjects under alternative treatment”), and includes principal stratum, total, and direct effects from the causal inference literature (the latter two under different names).
What it doesn’t do (and where I suspect most people’s issue lies) is explicitly acknowledge or reference the causal inference literature that describes these concepts.
But this isn’t a sleight on causal inference – the Addendum doesn’t acknowledge any previous work. The reason is simple – ICH guidelines don’t include citations. I don’t know why not, but they don’t. So it quite literally couldn’t cite the causal inference literature even if it wanted to.
But it could have at least *discussed* the causal inference literature, right?
Sure – but I think this criticism misses the point of what the addendum is. It’s not an academic paper, which would describe prior work in the area, explain how the new work fits in, and justify why things were done the way they were.
It’s a regulatory document. Its objective is to communicate to pharmaceutical companies what information they should include in their regulatory submissions.
While some readers may indeed appreciate the wider context behind the ideas in the addendum, I suspect most of its users simply want the headline information: what does the FDA/EMA want me to include in my submission?
It’s fair to wish the addendum had acknowledged the areas it was built upon. But I’m not sure doing so would have been in aid of its main objective.
Couldn’t they have at least kept the terminology?
To be fair, they did keep some of it (“principal stratum” is straight from the causal inference literature, and the term “estimand” has been around decades). But they did change a lot.
And to be honest, I think they needed to.
I recently read a causal inference paper that defined a “total effect” in three (!) different ways, corresponding to a while-alive, composite, and treatment policy effect, to use the addendum’s terminology. If a single term can be used to define three different effects, I think it’s fair to say it’s not precise enough to be used in regulatory submissions (or in publications of study results for that matter).
And it’s not just the “total effect” – I find terms like “per protocol” and “ITT” effect, which seem common in the target trials world, pretty unclear as well. For instance, what if I want to know the effect if people hadn’t received protocol-prohibited medications, but I don’t care that they deviated from the protocol by stopping treatment early? This doesn’t seem adequately described by either an ITT or per protocol effect.
I know the addendum’s terminology isn’t always popular, even in the clinical trials world. But in all honesty, I find it a lot clearer than the alternatives.
But the addendum doesn’t talk about key issues like time-varying confounding
This is true; I suspect it’s because the addendum’s authors view time-varying confounding and its ilk as estimation issues, not an estimands issue. And while the addendum does touch on estimation, that’s not really its focus.
And I think that’s fine – it can’t do everything, and it seems reasonable to limit the scope to describing how to clearly formulate an estimand without saying exactly how it should be estimated (particularly as there’s no “one-size-fits-all” approach to do so).
OK, but why are people acting like everything in the addendum is new, when we’ve known all this stuff for years?
I’ll start with the part I agree with – it’s certainly the case that much of what’s in the addendum has been known (and, perhaps even in certain circles, been obvious) for years. For instance, the importance of clearly stating the research question (or causal effect) of interest; using appropriate statistical methods for your research question; and using sensitivity analyses to test robustness are all things that have been written about extensively, both in the causal inference literature and beyond.
And it would be remiss not to mention that many of the statistical techniques we use to estimate the estimands described in the addendum were developed in the causal inference world (though of course some originated in other fields, such as the missing data or clinical trials literature).
The addendum owes a lot to the decades of work that preceded it. It seems unlikely it would exist in its current form without all the thinking that’s come from the causal inference world and elsewhere.
However, I would push back strongly against the idea that the addendum offers nothing new.
It forces researchers to be clear about their estimands in a way that other frameworks don’t.
I’ll offer an example. I read a target trial study recently. It described their causal effect of interest exactly how it was suggested they should in the target trial framework publications (a “per protocol” effect).
Despite that, I doubt many people reading the paper would have understood what their estimand was. I certainly didn’t.
This is because nowhere did they say how patients who died were handled in the estimand definition (or even that death was a relevant intercurrent event, which it was). Working backwards from their analysis strategy, I *think* they estimated what patient’s outcomes would have been if they hadn’t died (a hypothetical strategy), but I couldn’t say for sure.
Had they used the ICH E9(R1) framework, this issue would have been clarified up front, as it would require them to explicitly list death as an intercurrent event, and then explicitly list which strategy was being used to handle it.
To put it another way, describing the estimand using ICH E9(R1) would have alerted those reading the paper that study results pertained to an impossible hypothetical setting (where patients couldn’t die) – something that may affect people’s confidence in the relevance and interpretability of the results, and something that was entirely absent from the original paper.
(to be clear, this isn’t a criticism of target trials/observational studies – the lack of clarity in the estimand is something we see in trials *all the time*, and was one of the motivating factors behind the addendum)
Final thoughts
I don’t mean to imply we can’t or shouldn’t criticise the addendum – there are things in there I don’t love (for instance, defining intercurrent events as starting from treatment initiation instead of baseline).
But before we dismiss it out of hand, I think it is useful to understand why it was written the way it was. After all, there might be some useful stuff in there – even for those outside the clinical trials world.
Brennan - excellent write-up! Having served as the lead PhRMA rep on the ICH E9(R1) expert working group, your balanced perspective on perceived/actual hits and misses of the addendum is a welcome addition to this forum. I hope a lot of people read and learn from it. -- Devan Mehrotra
Nice post. I think the issue on time-varying confounding is driven by the fact that FDA only provide guidance on adjusting for baseline covariates, and EMA do so as well, but also explicitly advise against including time-varying confounders part of the primary analysis.
It helps to remember what the definition of a confounder is - related to exposure and outcome, and not on the causal pathway. This is all well and good in an epidemiological study on observational data, but for an RCT, for the variable to be related to treatment, there must be an imbalance between the treatment arms. It is not just that the covariate changes over time, but it does so differentially across arms. And such changes will thus be (likely) caused by the randomised treatment.
These are also trials for drugs which in general have a very specific mechanism of action. So it is difficult to justify how these can not be on the causal pathway, if it is also related to outcome. This argument potentially holds less water if you have more complex,, but for Ph 3 trials aimed at regulators, this is rare.
So the challenge is to find a time-varying confounder (not a covariate), where controlling for it or otherwise impacts the estimation of the treatment effect. It should not be underestimated how incredibly rare this is. Even then, you have to pre-specify and nail your colours to the mast of which analysis you are going to trust, especially as you know you are likely on the casual pathway. Such pre-specification is always going to be more challenging if you don't know what covariates are going to actually be varying, by how much, and if predictive. The EU regulators have done so - they request the unadjusted as primary, and discuss albeit briefly what to do with the adjusted analyses.