4 Comments

"The bias defined in (2) seems useful as a way of decomposing (3) so we can quantify the relative contribution of causal and statistical bias, but I think I’d rather know the overall / accumulated bias."

Wouldn't the bias in (2) be a potentially useful indicator of how much model misspecification contributes to overall bias? Perhaps a signal of how useful non/semi-parametric estimators might be relative to improving design or data quality?

Expand full comment

Definitely, and I think the impact of model misspecification is what a lot of research that evaluates different estimation methods is concerned with. But typically that’s in contexts where the identifying expression is correct (no causal bias), in which case (2) is the same as (3). So when does (2) give us something different to (3)? When there is some problem with the identifying expression and we want to know how much each part contributes to the ‘total’ bias in (3). My impression is both that this sort of setup is less common in methodological research, though maybe you’ll educate me otherwise!

Expand full comment

Ah, good point. Thinking about it now, I am blanking on examples where (2) and (3) aren't equivalent.

Spitballing, I'd be interested to see a simulation study where (a) the true causal effect and (maybe) the DAG are known but the full DG law is complex and not immediately apparent to the analyst, and (b) both "incorrect" and "correct" observational designs are used to analyze the data, maybe following the sorts of examples Hernan and co. have explored using the target trial emulation framework.

That probably all sounds obvious to a statistician!

Expand full comment

A couple of postscripts on this:

1. There’s a paper in American Sociological Review that uses the terms ‘theoretical estimand’ (for estimand) and ‘empirical estimand’(for identifying expression): https://journals.sagepub.com/doi/10.1177/00031224211004187

My gripes above about ‘statistical estimand’ directly translate to ‘empirical estimand’, but that paper gets an extra gripe point for adding the word ‘theoretical’ to ‘estimand’. This makes it sound somehow irrelevant, when it’s what I’d term the ‘actual estimand’.

2. So far three people have commented ‘You haven’t actually defined your estimand in this post’. No shit Sherlock! I think my point is for whatever the causal estimand is. I feel like there must be something I’m missing, because it’s been made by three people so far. OTOH perhaps they discussed it so the comments aren’t independent and it’s more like one.

3. I’ve been painfully aware recently that I sometimes use ’estimand‘ in the sense of ‘statistical estimand’ above, when it’s not for a causal estimand. E.g. in our letter-to-the-editor about Oberman & Vink’s paper (blog post at https://open.substack.com/pub/tpmorris/p/fixed-vs-repeatedly-simulated-complete). So I appreciate that, since I could argue against myself about this, others could. I frequently hear from applied researchers that they regard causal estimands as alchemy, enough so that we really need to minimise confusion, and that’s why I’ve come out swinging above.

Expand full comment