Oxymoronic (on random confounding)
This one really does meander. I probably should have gone with a more diplomatic title.
Every couple of years there is a kerfuffle on social media, some blog, or in the published literature, when someone or some group confidently pronounces that, contrary to popular belief, RCTs are subject to confounding. The specific phrase random confounding gets wheeled out to refer to the fact that, for a given trial, we the distribution of some covariate might not be identical across treatment arms, which is then claimed to lead to bias.
In the below I’ll discuss an idealised RCT where there is no cheating in the randomisation, perfect blinding, not missing outcome data, etc.
It’s useful to know what ‘bias’ means (oh, and confounding)
Typically, the naive statistician’s first rebuttal is that the definition of bias is
(Not to distract you but you might also be interested in this post!)
In words, bias is the difference between the expected value of the estimator and the true value of its estimand. This ‘vanilla’ definition is an expectation over the possible randomisation sequences. For any imbalance we might observe, there is an equal and opposite imbalance that was just as likely to occur. This then invalidates the claim of random confounding leading to ‘bias’. It doesn’t resolve the argument though. I once saw someone argue back that bias is
That is, they removed the expectation part, meaning that if the estimator is not bang on the true value of the estimand it targets every time, that study is biased. Random error is now called bias! This bar, while admirable, is set impossibly high. However, it brings us to a second point often made, originally due to Fisher: a key reason for using randomisation was that it permits accurate, straightforward estimation of the random error (please take this as read for this post). That is, the purpose of randomisation is not to balance confounders, it is to be random, and this is what provides a basis for inference. Stephen Senn has written a bunch of stuff about this on Deborah Mayo’s blog.
Above, I gave what I called the vanilla definition of bias. Sometimes termed unconditional bias. You might reasonably be interested in conditional bias. The second definition above is, I think, the most extreme form. A less extreme version contains an expectation but conditional on some sample statistic s.
This sort of definition has been used extensively in the adaptive designs literature, where s might be an indicator of some decision made in a given realisation of a design, such as early stopping12.
Suppose we have one prognostic covariate X and a measure of sample imbalance, s, that describes some aspect of how the covariate’s distribution differs between two randomised arms. We can then evaluate whether, given s, the expectation of our estimator is equal to the true value of the estimand. Lo and behold, there is a discrepancy! Now, this is a conditional-on-s bias. This is what I think the random confounding clowns likely mean. It’s important to be precise with words. Above, I’ve distinguished between unconditional and conditional bias. Other people have been careful to give confounding a precise, rather than hand-wavy, definition. On these counts, I object to reading ‘random confounding causes bias’.
I’ll come back to this after a little detour.
Conditioning
Disclaimer: I’ve followed lots of discussions on this but don’t claim to be an expert, and get confused about whether or not people mean the same thing or not with a given term. So please read the below with a healthy amount of scepticism.
In Cox’s famous illustration of two-populations-with-unknown-means-but-known-different-variances3 , he noted that a conditional test, with critical regions
is not the most powerful procedure over the whole sample space. But he argued for a conditional test nonetheless, in terms of relevance, and this is important for applied statistical work: it’s fine to use a powerful procedure but you want inference to be relevant to the study actually done, not to hypothetical studies that weren’t done.
Is this (roughly) the conditionality principle? I think Bayesians would be sympathetic to this (Bayes respects the likelihood principle, quite closely related to the conditionality principle).
A couple of years ago, there was a super interesting discussion on twitter about interpretation of credible intervals. It started with the point Bayesians often make that people want to make probability statements about hypotheses, not statements about confidence procedures. Jonathan Bartlett and I both asked how you interpret a Bayesian credible interval. ICYI, Jonathan blogged about it. The usual answer is ‘it tells you how to bet’ or ‘a degree of belief’. My questions were:
What does the ‘95%’ claim of a 95% credible interval actually mean – is there any way to check it’s ‘right’ if you can’t link it to a tangible frequency? If not, why trust it?
How do credible intervals resolve the issue with confidence intervals that a given interval either does or does not contain the true parameter?
I received a really useful answer on question 1 from Erik van Zwet, who essentially said that the 95% can be interpreted as a frequency but you do not lose your coverage guarantee once you condition on the data. In this sense, credible intervals do not shun frequentist ideas but promise stronger frequentist properties (I have no deep understanding of this and I imagine some Bayesians would disagree). But hey, now this statement I read years ago makes sense4!
In any case, no Bayesian should object to achieving frequentist validity; effectively, Bayesians want and promise much more: calibration conditional on the data in addition to unconditional calibration (e.g., in Rubin 1984, I call such frequency calculations “Bayesianly relevant and justifiable”).
– Rubin, 1996
Back to baseline imbalance in RCTs
All the above is to say that arguments for conditioning on the data suggest it is reasonable to care about a chance baseline imbalance affecting inference for the study at hand. The point is that imbalance in a prognostic covariate implies something about the potential outcomes under control being slightly different. For this particular trial, that relevance criterion is… relevant.
I think it is uncontroversial to say that baseline imbalance in a prognostic covariate creates conditional bias. It’s controversial to describe this as confounding. Like, I’ve seen someone include a ‘covariate–randomised arm’ edge on a causal DAG to represent chance imbalance… I’m trying not to shout here but if the assumptions in a cDAG are mainly encoded by absence of edges, and you cannot omit an edge structurally removed by randomisation, surely you can never omit any edge from any cDAG!
For want of a better place to say this, check out
’s Out of Balance post.If people want to say ‘conditional bias’, that’s fine, but people often look through table 1’s and using this to criticise a trial is silly. We can adjust for covariates in the analysis of RCTs. I’ve spent a lot of time on this topic (read this paper for an overview5; also, Kelly Van Lancker’s nice pre-print on standardisation6). Adjustment removes the conditional bias (in doing so, it also tends to reduce the variance of the treatment effect estimator compared with the unadjusted estimator). As a bonus, when there is no imbalance, there is no conditional bias to remove; if the estimator sees that a prognostic variable is balanced, it has even lower variance! If a covariate is imbalanced but not prognostic of outcome, there is no conditional bias. The moral is that it is not the imbalance s that matters but the prognostic value of X.
So if there are measured covariates that we know or believe to be prognostic, such that imbalance would concern us, our SAP (statistical analysis plan) should just prespecify adjustment for those covariates.
We can even use a randomisation method that ensures balance, e.g. block randomisation stratified by covariates. Not for many covariates, and it’s a bit tricky then they aren’t categorical or contain many categories, but see point (2) below. Random confounding discussions always seem to forget this and assume trials use simple randomisation. Few trials do7!
What about unobserved and unmeasured covariates? It’s awkward because they are not recorded in the data, so we can’t even quantify imbalance with s. Are we stuck with conditional bias on some unobserved imbalance? Two comments:
Generally, yes, we are. But this is where we can rely on randomisation to give reasonable inference that accounts for potential imbalance in unobserved covariates. Don’t forget that this is the point of standard errors (again, Senn has written about this in various places).
A little-known bonus of covariate-balancing schemes is that, although we can’t usually stratify randomisation on many covariates, when we balance on one, we can do not worse in terms of others. Aickin (pp114)8, showed that if an unmeasured covariate is correlated – negatively or positively – with the stratifying covariate, balancing the measured covariate also better balances the unmeasured covariate. Intuitively, think about the two extremes where the unmeasured covariate is i) identical to the measured one, or ii) perfectly negatively-correlated with the measured one.
I like randomisation, but it is an imperfect tool. Imperfect is not good enough for some people. Fine. They have higher standards than me. But as with other imperfect tools, if good isn’t good enough for you, the onus on you is to come up with something better. Can you tell me what beats randomisation in general?
Postscript on ‘reasonable’ conditioning statistics
One of the scariest emails I ever received was from Sander Greenland, who had seen something I’d written and wrote ‘I wanted to share my reaction’. He’d copied in a few others who he’s discussed these ideas with before. We hadn’t communicated before, so I was really intimidated! Though I didn’t understand all of the subsequent email conversation, I did learn an awful lot.
Sander chided me for criticising a claim that split (real) studies into ‘significant’ and ‘non-significant’ and said the significant ones were biased/optimistic. He argued that I was talking about the most basic form of bias (unconditional). He views conditional biases like the one I was criticising as something we should be more interested in.
Fair enough in principle (see above), but I do think we need to argue for the relevance of what s we condition on: Some choices seem inherently unreasonable to me.
For example, do a simulation study where you generate data Y~N(0,1). Suppose we are interested in μ=E(Y) and estimate it using the sample mean. This is unconditionally unbiased. However, we now investigate the conditional bias of the sample mean conditioned on:
That is, we will estimate the conditional bias of estimates that are >0 vs. >=0. This seems inherently unreasonable. By construction, neither can be conditionally unbiased (one set is definitely biased upwards and the other definitely biased downwards). Assuming you agree, I wonder how you view a meta-epidemiological review that splits its sample according to whether a p-value crossed 0.05?
I Marschner. A General Framework for the Analysis of Adaptive Experiments. Statistical Science. 2021; 36(3): 465–492. doi:10.1214/20-STS803
DS Robertston, B Choodari-Oskooei, M Dimairo, L Flight, P Pallman, T Jaki. Point estimation for adaptive trial designs I: A methodological review. Statistics in Medicine. 2023; 42(2): 122–145. doi:10.1002/sim.9605
DR Cox. Some problems connected with statistical inference. Annals of Mathematical Statistics. 1958; 29(2):357–372. doi:10.1214/aoms/1177706618
D Rubin. Multiple Imputation After 18+ Years. Journal of the American Statistical Association. 1996; 91(434):473–489. doi:10.2307/2291635
TP Morris, AS Walker, EJ Williamson, IR White. Planning a method for covariate adjustment in individually randomised trials: a practical guide. Trials. 2022; 23:328. doi:10.1186/s13063-022-06097-z
K Van Lancker, F. Bretz, O. Dukes. The Use of Covariate Adjustment in Randomized Controlled Trials: An Overview. arXiv stat.me. 2023. 2306.05823
BC Kahan, TP Morris. Reporting and analysis of trials using stratified randomisation in leading medical journals: review and reanalysis. BMJ. 2012;345:e5840. doi:10.1136/bmj.e5840
M Aickin. Randomization, balance, and the validity and efficiency of design-adaptive allocation methods. Journal of Statistical Planning and Inference. 2001; 94:97–119. doi:10.1016/S0378-3758(00)00228-7