A few years ago I was working on a paper about covariate adjustment in randomised trials1, when a colleague working on a SAP (statistical analysis plan) for a trial approached me: “you say what a good idea covariate adjustment is but…” she said, followed by several practical questions about how you choose an adjustment method and pre-specify it in a SAP.
Lots of statisticians know a bit about covariate adjustment in randomised trials, and a particular phrase seems to come up repeatedly. They don’t say “covariate adjustment increases precision”, as I did above; instead they say something more cautious, like “covariate adjustment increases precision (or at least power)”. The reason behind this caution is they are hedging for cases of non-collapsible population-level summary measures2, such as odds ratios and hazard ratios. For these measures, people often adjust for covariates in a regression model. When you do that, you go from targeting a marginal estimand to a conditional one.
As a little meander from the point of this post, there is an important point I want to make sure you appreciate:
The terms marginal and conditional refer to the estimand (population-level summary measure3).
The terms unadjusted and adjusted refer to the statistical analysis (whether or not covariates are ignored).
In RCTs, we can estimate a marginal measure with an unadjusted or adjusted analysis, or we can estimate a conditional measure with an adjusted analysis. We cannot, however, target a conditional measure with an unadjusted analysis4. The latter point often leads people to conflate the concepts in (1) and (2) and mistakenly assume we cannot adjust for covariates if we are interested in a marginal estimand. The reason this matters IMO is that my default in an RCT would be a marginal estimand, adjusted for covariates5.
We need to be careful about comparing statistical properties of different estimands. We can compare type I error or power of statistical methods that estimate a marginal vs. conditional measure when the null is zero because the two scales share a null. When the null is not zero, as in non-inferiority trials, we cannot immediately compare them6. We cannot compare bias or efficiency because they target different estimands7. Failing to appreciate this is common enough that we mentioned it in our simulation tutorial paper. The point is really well made in Rhian Daniel’s apples-and-oranges paper8!
This is why statisticians use that cautious phrase about covariate adjustment increasing precision (or at least power): they are thinking about covariate adjustment using a regression model (usually Cox or logistic regression).
To make this next bit concrete, let’s think about odds ratios9. We can write the respective definitions of a marginal and conditional odds ratio as
If you are not familiar with the pipe symbols | in the expression on the right, these are read as ‘given’ or ‘conditional on’. Yᶻ is the potential outcome under action z (encoded as 0 or 1) and X is a covariate. When written like this, there is a black-and-white distinction between the marginal and conditional odds ratio.
Now to the point of this post: The distinction is less black-and-white in practice.
Suppose we are running two trials of identical design. One recruits participants in London and the other in Aberdeen. The binary outcome measure is the same in both trials. There is a measured covariate X. Both trials wish to target a marginal odds ratio, and they estimate it using standardisation or weighting (say). Once the trials are reported, the investigators decide it might be wise to collaborate and combine their results. The thing is, once we think about combining them, it becomes clear that although each trials result is marginal with respect to X, it is conditional on city (London or Aberdeen). “But”, you say, “we have recorded city, so we could marginalise with respect to city”. We could. But now suppose that two more trials, again of identical design, happened in two Swiss cities. We face the same problem: the UK trials’ “marginal” estimand is still conditional on country.
It works the other way round too. If we want a conditional odds ratio, we can choose what it will be conditional on. But imagine there is some variable U, which is not known but is prognostic of Y. Above, we defined the odds ratio conditional on X. The odds ratio conditional on U would be:
We have no way to estimate this, since U is unmeasured – remember from above that we cannot estimate a conditional measure if we do not know the covariate. Generalising from this, we cannot estimate the conditional odds ratio unless we can perfectly measure and correctly condition on U and every other prognostic variable (hint: we haven’t so we can’t!) and are looking at each specific combination separately. Supercovariates folks: take note.
My view is that inclusion and exclusion criteria reflect what we definitely want to condition on. It’s reasonable to say that a treatment is for a specific condition and we’re interested in its effects on people with that condition, not some average among those with and without. By default, I lean towards an estimand that is marginal with respect to other covariates. Firstly, because it helps us to remember that we are estimating average effects over a group of individuals who are not all alike, and not individualised effects for individuals who magically happen to have the same effect. Secondly, because estimation of conditional measures requires stronger modelling assumptions.
(Note: this post was inspired by a chat with Rhian Daniel at PSI and the panel discussion at the session organised by Sarwar Mozumder. Thanks to both – though acknowledging them here does not imply their endorsement.)
Morris TP, Walker AS, Williamson EJ, White IR. Planning a method for covariate adjustment in individually randomised trials: a practical guide. Trials 23, 328 (2022). doi:10.1186/s13063-022-06097-z
In the parlance of the ICH E9(R1) Addendum
In the parlance of the ICH E9(R1) Addendum
Intuitively: you cannot have an estimand that is conditional on a covariate if the analysis is unaware that the covariate exists.
Something I’ve changed my mind on in the last 10 years.
They can be compared if you can convert the margin between scales. This is what we did in:
Broer SDL, White IR, Morris TP, Weir IR, Fiocco M, Quartagno M. Summary measures in non-inferiority clinical trials with a time-to-event outcome: an empirical comparison of power. BMC Medical Research Methodology 25, 139 (2025). doi:10.1186/s12874-025-02576-4
As a rule, you should not compare methods that target different estimands in terms of things like bias. Here’s a classic blooper: repeatedly simulate a trial with a specific effect of treatment; then simulate a level of non-adherence; make effect among non-adherent equal to 0; then compare methods that target the hypothetical effect if everyone were to adhere vs. methods that target the treatment policy/decision.
Daniel R, Zhang J, Farewell D. Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets. Biometrical Journal. 2021; 63: 528–557. https://doi.org/10.1002/bimj.201900297
I’m not going to give a robust defence of odd ratios but I do hear a lot of people say that odds aren’t intuitive and doubt this a bit. I often hear people talk in terms of odds, e.g. for equal sized groups, it’s clinical investigators who say “1:1 randomization” and not “1/2 randomization”.
Maybe this is all consistent with the recommendation to not use odds ratios as estimands? Or is it just that it is impossible to convince researchers to stop using them?