Proving any connection between specific foods and health is extremely difficult. The outcomes we care about – heart disease, cancer, diabetes, etc. – develop over decades and have many purported causes. It’s hard to justify the expense in time and money of running a decade long randomized controlled trial to prove the consequences of consuming a single food. A $100 million trial of alcohol was recently abandoned partway through (here) and one of the best diet studies ever conducted was sort-of-halfway retracted recently due to mistakes in the randomization process (here).

The best evidence we have for most nutrition questions comes from large, long-term, prospective observational trials. As a baseline for how good those studies can possibly be, I’ll review the best studies I can find of smoking. In a million years there will never be an observational study of a widely consumed food that shows effect sizes as large or p-values as small as these. Any claims we make based on the small effect sizes seen in observational nutrition studies should be proportionally modest.

Before the 1950’s, retrospective studies that asked people with and without lung cancer how many cigarettes they smoked suggested that smoking causes lung cancer. Those results led to funding for progressively larger prospective trials that would periodically ask people how much they smoke and see what happens to them. First was a study of 188,000 white men (results after 20 months and 44 months). Then there was Cancer Prevention Study I, a 12-year study of ~1 million people with a roughly even split of men and women, including 7% people of color (results after 48 months and 12 years).

It turns out that smoking cigarettes is quite bad for you. So bad, in fact, that the people who ran the study switched from smoking cigarettes to smoking pipes. Sometimes you just can’t win. Besides increasing the statistical significance of the correlation between smoking and mortality, these studies showed three forms of dose response: more cigarettes per day is more dangerous, more years of smoking is more dangerous and as duration of cessation increases, risk decreases. In the absence of a randomized trial, a dose response can increase our confidence in a causal relationship between smoking and mortality.

The strongest statistical results I’ve found come from a 50-year long prospective trial in Britain of 34,000 male doctors. People who smoked over 25 cigarettes per day had lung cancer mortality 25 times higher than never-smokers and had all-cause mortality 2.3 times higher than never-smokers. How about people who smoke less? The lowest category of smoking reported was 1-14 cigarettes per day, which still had an all-cause mortality rate about 50% higher than never-smokers. 14 per day still sounds pretty heavy to me. The lowest level of smoking I could find referenced in any large, prospective study was 1-4 cigarettes per day in 43,000 Norwegians starting in the 1970’s. Even at this lower level of smoking, they still had all-cause mortality about 50% higher than never-smokers.

So, what is the probability that the apparent correlation between smoking and all-cause mortality is a random fluctuation? The statistical significance of mortality trending up in correlation with smoking is reported in the British doctor’s study in terms of chi-squared. Higher values of chi-squared correspond to a lower p-value, or the probability that all the groups we’re looking at are actually the same and the differences we’re measuring are just random fluctuations. A chi-squared value of 15 corresponds to an extreme p-value of ~0.0001. Once p-values get to that ballpark or below it doesn’t really matter what their exact value is. I calculated them anyway just for fun. The all-cause mortality trend across never-smokers, former smokers, and current smokers has a totally nuts chi-squared of 699. I tried calculating the corresponding p-value in Python using the scipy library, but the number was too close to 0 to calculate. WolframAlpha is able to do better and gives a p-value of 10^-153. The chi-squared for the trend between never-smokers and people who currently smoke 1-14, 15-24, and 25+ cigarettes per day has an even crazier chi-squared of 869. WolframAlpha is unable to calculate how small the p-value is (or large, if you think about the size of the inverse). We humans can observe that the p-value is shrinking almost exactly exponentially in chi-squared and extrapolate that at a chi-squared of 869 the p-value will be approximately 1 / 10^190. To give a sense of scale, there are approximately 10^80 atoms in the universe, so if every atom in the universe split into the number of atoms already in the universe, it’d still be smaller than that number by 10^30.

This is the kind of result you get by spending 50 years watching tens of thousands of men smoke about half a million cigarettes each. There will never be a nutritional study that reaches this level of significance.

Hey Dayton, I’ve been following your covid-19 coverage with great interest. Then I found this essay, which is interesting, but I think there is a little error.

It seems like you are drawing a direct inference from the p value to the likelihood of a Type I error. But that isn’t quite right. A calculated p value of, say, .01, doesn’t tell us that there is a 1% chance that the results we observed were just caused by chance. By itself, it actually doesn’t tell us anything at all about the likelihood of that. Rather, it tells us that *if* in fact the observations resulted from just a random fluctuation, as you say, there is a 1% chance that we would see results at least as extreme.

In order to calculate the chance that the null hypothesis is true, you would need a prior probability — that is, how probable was the null before looking at this set of evidence. If the null was already extremely likely (say, if the hypothesis is the homeopathic water has memory theory), then if I calculate a small p value (say, .001), my conclusion would still be that it is probably the result of error/random variance, because I think there is way less than a 1 in 1000 chance that homeopathic water memory is true, so while my observations are rather unlikely in the world of the null, that world is still more likely than the reject the null world.

LikeLike

Yes, you’re right. I’ll be more careful about this in the future. Everything in this post would still be correct if I reworded a few sentences and the title along the lines of “if the null hypothesis were true, the probability we would have observed differences at least this large is…” In the interest of time, I’ll just leave this post as-is with these comments below. Thanks for the precise correction.

I enjoyed googling ‘homeopathic water memory’.

LikeLiked by 1 person