Simulation studies and methodological reviews
There are a few recent reviews of simulation studies about methods that handle particular problems.
Vargas’ review of methods to deal with confounding
Abell’s review of methods to deal with non-compliance
Smith’s (scoping) review of simulation studies comparing ‘statistical’ and ‘machine learning’ approaches to risk prediction for time-to-event data (I am a co-author1)
There are no doubt more that I’ve missed through forgetfulness or just not having come across them – do let me know in comments.
Simulation studies are a useful tool. I use them all the time for little things: getting a handle on a method, checking intuition for how something will work, etc. These are not for publication (I’d need like 10 Tim-clones to write them all up), but some are.
Simulation studies are a great way to shed light on things like non-convergence, which you might not get this from analytic results. Sometimes they are necessary to study a particular method. This is often true for studies that put together two methods (stuff like ‘methods to both deal with confounding and handle missing data’ or ‘using multiple imputation with fractional polynomial models’2).
Be aware
The obvious thing to remember for any review is garbage in → garbage out. I am not saying this as a criticism of the above reviews but rather as a reminder to reviewers and readers that the available evidence on methods might be shaky. I’d suggest reading this rather nice paper3, which I really like.
An interesting point that came up in Vargas et al.’s review is that by making ‘simulation study’ one of our inclusion criteria, we might get an warped view of the available methods (they did acknowledge this). They found lots of simulation studies on propensity scores and none on other adjustment methods. Assuming the review results were accurate (I have no reason not to), isn’t that weird?
Why was this? I genuinely don’t know. It might be that there is something about propensity score methods that lend them to simulation studies (not straightforward to get analytic results). Other methods perhaps didn’t need simulation studies to make them fit-for-use. Or perhaps the fact that propensity scores have been so heavily promoted using the whole ‘mimics a randomised trial’ slogan, so got more ‘late-phase’4 attention from people who might use them, and who might be less likely to get results mathematically. Meanwhile, people working on other methods may have approached their evaluation more mathematically.
So I was surprised at so few mentions of Robins’ G-methods, or even of regression adjustment.
I appreciated figure 1 of Abell et al.’s article, which gives a taxonomy of methods including those for which the review returned no results.
Meta-analysis of simulation results? Nope.
This doesn’t flow very well but then it is a post knocked up in 20 mins…
I’ve seen people argue that we should conduct numerical syntheses of results from these reviews. My knee-jerk is to disagree, though I could be persuaded. Essentially my disagreement is for reasons outlined in Carolin Strobl’s excellent paper5. My take-home from that is that we don’t really want to know which methods perform well on average (statisticians aren’t robots and robots aren’t that dumb); we want to know which work well when, and to choose methods for the problem at hand.
I was a late-arrival co-author. The setup was to compare ‘statistical’ vs. ‘machine learning’ approaches. Of course, we agreed that almost no method can be neatly classified as one.
TP Morris, IR White, JR Carpenter, SJ Stanworth, P Royston. Combining fractional polynomial model building with multiple imputation. Statistics in Medicine. 2015; 34: 3298–3317. doi: 10.1002/sim.6553.
A-L Boulesteix, R Wilson, A Hapfelmeier. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies. BMC Med Res Methodol 2017; 17:138. https://doi.org/10.1186/s12874-017-0417-2
G Heinze, A-L Boulesteix, M Kammer, TP Morris, IR White, for the Simulation Panel of the STRATOS initiative. Phases of methodological research in biostatistics—Building the evidence base for new methods. Biometrical Journal, 2023 (epub ahead of print). https://doi.org/10.1002/bimj.202200222
C Strobl, F Leisch. Against the “one method fits all data sets” philosophy for comparison studies in methodological research. Biometrical Journal. 2022 (epub ahead of print). https://doi.org/10.1002/bimj.202200104