Confidence intervals and tests often correspond. That is, if a one-sided test returns p=0.05 then the bound of a one-sided confidence interval should touch the null H0; if a one-sided test returns p>0.05 then the interval should contain the null; etc.
The following situation came up in a recent discussion. An estimator came with a test and confidence interval, but the two did not necessarily correspond (let’s just say the 1–α confidence interval contained 0 but the test rejected at level α). For this case, they did need to correspond.
The conversation boiled down to the choice between a test-based confidence interval (think test-inversion) vs. a confidence interval-based test. You don’t have much information to choose between them yet. So let me tell you the following properties:
Test controls type I error under the null H0:θ=0
Test has high power to reject the null H0:θ=0
Confidence procedure1 has nominal coverage under all nulls
All of these are positives, which makes it hard to say one is obviously worse/better. What would be your choice, I wonder? Judging by a few people’s views, the general preference is to derive a confidence interval from a test. I remember learning about things like likelihood-ratio tests being more accurate than Wald tests (at least under a correct model), and often confidence intervals are Wald-type, so maybe that’s the reason for this view.
My choice here would be the reverse: to derive the test from the confidence interval. Why? Look at the properties again. Property 3 implies that the confidence procedure can be thought of as controlling type I error rate for any null. This implies that property 2 is a problem: the discrepancy between a given confidence interval and test result comes, presumably, because the test does not control type I error under nulls H0:θ≠0. So if we go ahead and derive a confidence interval to correspond to the test, the procedure must lack the advertised coverage.
You may well disagree with me on the above – or put me right on something – but I thought this was a nice illustration of the different information we get from tests and confidence intervals. It’s probably worth saying that lots of the applied projects I dip into consider non-inferiority or super-superiority, which no doubt influences my views somewhat. However, even for standard superiority questions with a zero-null, people do report (and interpret) confidence intervals. Is it not uncomfortable that these may be inaccurate, even if the test of the null is well calibrated?
Comments welcome as usual.
A nice term I heard Oliver Maclaren use to explain confidence intervals… though I’m likely using it wrong
This is a good discussion topic, but I found this post confusing. Why would the test not control Type I error under different point nulls? Property 2 depends on what your effect size of interest is. Another benefit of confidence intervals via test inversion is that you actually get p values that can be adjusted for multiple comparisons. Not sure you can do that if you do testing via confidence intervals.