Some thoughts on on ‘causal simulation’ tools

Dec 15, 2023

I recently attended a CIIG seminar on DagSim. At the end people were saying how brilliant it was. I have to admit I didn’t really get it. Essentially it seemed like the presenter wrote sequential densities for variables and DagSim drew one – arguably the most obvious – causal DAG that the particular sequence implies. I found that odd. Why? Well suppose we generate (X,Y) from a log-linear model. The data this produces doesn’t contain any information about whether the DAG is X→Y, Y→X or Y←U→X and is compatible with all of them (and more). My guess from the talk is that DagSim would assume the latter? IDK.

I thought I’d write down some thoughts on tools for causal simulation:

Presumably one of the main reasons to use these tools is to make sure we haven’t inavertently made a data-generating mechanism and a DAG incompatible with one another.
Relatedly, I’d like a tool to let me build up some arbitrary joint distribution by (joint or) conditional / sequential simulation and then show every causal DAG that’s compatible with this distribution. This seems straightforward enough if we’re just using P(Y)=f(X,W) to imply a direct edge between variable in one direction and not otherwise. Is that all there is to it? But note the above.
If you can do the above then I guess it’s useful to check which edges are absent according to your DGM. This lets us feed it into DAGGITY, etc.
In terms of practicality, it seems useful not just to simulate data from some model, as we usually do, but various counterfactuals Y^a and the indicators of a – or possibly even P(Y^a). This facilitates knowing/checking the numerical value of some causal estimand (you can just contrast the desired function of the counterfactual distributions), and thinking about how well-defined this is. This is also useful for showing what certain complex estimands target, e.g. some mediation estimands that I always flipping struggle with. It also seems like it would be really neat for work on validation of counterfactual risk prediction models.
A good purpose of causal simulation is to understand/illustrate concepts (for oneself or for teaching). So if you simulate the counterfactual data and P(Y^a)=0 then you can also see how changing positivity affects your ability to estimate a given estimand, why the curse of dimensionality is a problem when we want to adjust for continuous variables (just get the functional form a bit wrong), etc.
Another good purpose is to check our reasoning. E.g. if we think reversing some edge will change things, or that one edge is somehow more imporant than another. I guess checking reasoning is a pretty informal thing to do.
…

I had more ideas but ran out of time when I started and now can’t remember. Hopefully it’ll come back! Thoughts welcome on the above.

Statistical methodology meanderings

Discussion about this post