Healthier kids: Using Stan to get more information out of pediatric respiratory data

Robert Mahar, John Carlin, Sarath Ranganathan, Anne-Louise Ponsonby, Peter Vuillermin, and Damjan Vukcevic write:

Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1,197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index (LCI), the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables LCI to be estimated using tests with different degrees of completeness, something not possible with the standard approach.

They continue:

Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision.

Yesssss! This reminds me of our work on serial dilution assays, where we squeezed information out of data that had traditionally been declared “below detection limit.”

Mahar, Carlin, et al. continue:

Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions.

Wow—all my favorite things! And check this out:

Keywords: lung clearance index, multiple-breath washout, variance components, Stan, incomplete data.

That’s right. Stan.

There’s only one thing that bugs me. From their Stan program:

alpha ~ normal(0, 10000);

Ummmmm . . . no.

But basically I love this paper. It makes me so happy to think that the research my colleagues and I have been doing for the past thirty years is making a difference.

Bob also points out this R package, “breathteststan: Stan-Based Fit to Gastric Emptying Curves,” from Dieter Menne et al.

There’s so much great stuff out there. And this is what Stan’s all about: enabling people to construct good models, spending less time on figuring how to fit the damn things and more time on model building, model checking, and design of data collection. Onward!

Causal inference using repeated cross sections

Sadish Dhakal writes:

I am struggling with the problem of conditioning on post-treatment variables. I was hoping you could provide some guidance. Note that I have repeated cross sections, not panel data. Here is the problem simplified:

There are two programs. A policy introduced some changes in one of the programs, which I call the treatment group (T). People can select into T. In fact there’s strong evidence that T programs become more popular in the period after policy change (P). But this is entirely consistent with my hypothesis. My hypothesis is that high-quality people select into the program. I expect that people selecting into T will have better outcomes (Y) because they are of higher quality. Consider the specification (avoiding indices):

Y = b0 + b1 T + b2 P + b3 T X P + e (i)

I expect that b3 will be positive (which it is). Again, my hypothesis is that b3 is positive only because higher quality people select into T after the policy change. Let me reframe the problem slightly (And please correct me if I’m reframing it wrong). If I could observe and control for quality Q, I could write the error term e = Q + u, and b3 in the below specification would be zero.

Y = b0 + b1 T + b2 P + b3 T X P + Q + u (ii)

My thesis is not that the policy “caused” better outcomes, but that it induced selection. How worried should I be about conditioning on T? How should I go about avoiding bogus conclusions?

My reply:

There are two ways I can see to attack this problem, and I guess you’d want to do both. First is to control for lots of pre-treatment predictors, including whatever individual characteristics you can measure which you think would predict the decision to select into T. Second is to include in your model a latent variable representing this information, if you don’t think you can measure it directly. You can then do a Bayesian analysis averaging over your prior distribution on this latent variable, or a sensitivity analysis assessing the bias in your regression coefficient as a function of characteristics of the latent variable and its correlations with your outcome of interest.

I’ve not done this sort of analysis myself; perhaps you could look at a textbook on causal inference such as Tyler VanderWeele’s Explanation in Causal Inference: Methods for Mediation and Interaction, or Miguel Hernan and Jamie Robins’s Causal Inference.

“Widely cited study of fake news retracted by researchers”

Chuck Jackson forwards this amusing story:

Last year, a study was published in the Journal of Human Behavior, explaining why fake news goes viral on social media. The study itself went viral, being covered by dozens of news outlets. But now, it turns out there was an error in the researchers’ analysis that invalidates their initial conclusion, and the study has been retracted.

The study sought to determine the role of short attention spans and information overload in the spread of fake news. To do this, researchers compared the empirical data from social networking sites that show that fake news is just as likely to be shared as real news — a fact that Filippo Menczer, a professor of informatics and computer science at Indiana University and a co-author of the study, stresses to Rolling Stone is still definitely true — to a simplified model they created of a social media site where they could control for various factors.

Because of an error in processing their findings, their results showed that the simplified model was able to reproduce the real-life numbers, determining that people spread fake news because of their short attention spans and not necessarily, for example, because of foreign bots promoting particular stories. Last spring, the researchers discovered the error when they tried to reproduce their results and found that while attention span and information overload did impact how fake news spread through their model network, they didn’t impact it quite enough to account for the comparative rates at which real and fake news spread in real life. They alerted the journal right away, and the journal deliberated for almost a year whether to issue a correction or a retraction, before finally deciding on Monday to retract the article.

“For me, it’s very embarrassing, but errors occur and of course when we find them we have to correct them,” Menczer tells Rolling Stone. “The results of our paper show that in fact the low attention span does play a role in the spread of low-quality information, but to say that something plays a role is not the same as saying that it’s enough to fully explain why something happens. It’s one of many factors.”…

As Jackson puts it, the story makes the journal look bad but the authors look good. Indeed, there’s nothing so horrible about getting a paper retracted. Mistakes happen.

Another story about a retraction, this time one that didn’t happen

I’m on the editorial board of a journal that had published a paper with serious errors. There was a discussion among the board of whether to retract the paper. One of the other board members did not want to retract, on the grounds that he (the board member) did not see deliberate research misconduct, that this just seemed like incredibly sloppy work. The board member was under the opinion that deliberate misconduct “is basically the only reason to force a retraction of an article (see COPE guideline).”

COPE is the Committee on Publication Ethics. I looked up the COPE guidelines and found this:

Journal editors should consider retracting a publication if:

• they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error) . . .

So, no, the COPE guidelines do not require misconduct for a retraction. Honest error is enough. The key is that the findings are unreliable.

I shared this information with the editorial board but they still did not want to retract.

I don’t see why retraction should be a career-altering, or career-damaging, move—except to the very minor extent that it damages your career by making that one paper no longer count.

That said, I don’t care at all whether a paper is “retracted” or merely “corrected” (which I’ve done for 4 of my published papers).

“Did Austerity Cause Brexit?”

Carsten Allefeld writes:

Do you have an opinion on the soundness of this study by Thiemo Fetzer, Did Austerity Cause Brexit?. The author claims to show that support for Brexit in the referendum is correlated with the individual-level impact of austerity measures, and therefore possibly caused by them.

Here’s the abstract of Fetzer’s paper:

Did austerity cause Brexit? This paper shows that the rise of popular support for the UK Independence Party (UKIP), as the single most important correlate of the subsequent Leave vote in the 2016 European Union (EU) referendum, along with broader measures of political dissatisfaction, are strongly and causally associated with an individual’s or an area’s exposure to austerity since 2010. In addition to exploiting data from the population of all electoral contests in the UK since 2000, I leverage detailed individual level panel data allowing me to exploit within-individual variation in exposure to specific welfare reforms as well as broader measures of political preferences. The results suggest that the EU referendum could have resulted in a Remain victory had it not been for a range of austerity-induced welfare reforms. Further, auxiliary results suggest that the welfare reforms activated existing underlying economic grievances that have broader origins than what the current literature on Brexit suggests. Up until 2010, the UK’s welfare state evened out growing income differences across the skill divide through transfer payments. This pattern markedly stops from 2010 onwards as austerity started to bite.

I came into this with skepticism about the use of aggregate trends to learn about individual-level attitude change. But I found Fetzer’s arguments to be pretty convincing.

That said, there are always alternative explanations for this sort of observational correlation.

What happened is that the places that were hardest-hit by austerity were the places where there was the biggest gain for the far-right party.

One alternative explanation is that these gains would still have come even in the absence of austerity, and it’s just that these parts of the country, which were trending to the far right politically, were also the places where austerity also bit hardest.

A different alternative explanation is that economic did cause Brexit but at the national rather than the local or individual level: the idea here is that difficult national economic conditions motivated voters in those areas to go for the far right, but again in this explanation this did not arise from direct local effects of austerity.

I don’t see how one could untangle these possible stories based on the data used in Fetzer’s article. But his story makes some sense and it’s something worth thinking about. I’d be interested to hear what Piero Stanig thinks about all this, as he is a coauthor (with Italo Colantone) of this article, Global Competition and Brexit, cited by Fetzer.

Collinearity in Bayesian models

Dirk Nachbar writes:

We were having a debate about how much of a problem collinearity is in Bayesian models. I was arguing that it is not much of a problem. Imagine we have this model

Y ~ N(a + bX1 + cX2, sigma)

where X1 and X2 have some positive correlation (r > .5), they also have similar distributions. I would argue that if we assume 0 centered priors for b and c, then multi chain MCMC should find some balance between the estimates.

In frequentist/OLS models it is a problem and both estimates of b and c will be biased.

With synthetic data, some people have shown that Bayesian estimates are pretty close to biased frequentist estimates.

What do you think? How does it change if we have more parameters than we have data points (low DF)?

My reply:

Yes, with an informative prior distribution on the coefficients you should be fine. Near-collinearity of predictors implies that the data can’t tell you so much about the individual coefficients—you can learn about the linear combination but not as much about the separate parameters—hence it makes sense to include prior information to do better.

If you want a vision of the future, imagine a computer, calculating the number of angels who can dance on the head of a pin—forever.

Riffing on techno-hype news articles such as An AI physicist can derive the natural laws of imagined universes, Peter Woit writes:

This is based on the misconception about string theory that the problem with it is that “the calculations are too hard”. The truth of the matter is that there is no actual theory, no known equations to solve, no real calculation to do. But, with the heavy blanket of hype surrounding machine learning these days, that doesn’t really matter, one can go ahead and set the machines to work. . . .

Taking all these developments together, it starts to become clear what the future of this field may look like . . . As the machines supersede humans’ ability to do the kind of thing theorists have been doing for the last twenty years, they will take over this activity, which they can do much better and faster. Biological theorists will be put out to pasture, with the machines taking over, performing ever more complex, elaborate and meaningless calculations, for ever and ever.

Much of the discussion of Woit’s post focuses on the details of the physics models and also the personalities involved in the dispute.

My interest here is somewhat different. For our purposes here let’s just assume Woit is correct that whatever these calculations are, they’re meaningless.

The question is, if they’re meaningless, why do them at all? Just to draw an analogy: it used to be a technical challenge for humans to calculate digits of the decimal expansion of pi. But now computers can do it faster. I guess it’s still a technical challenge for humans to come up with algorithms by which computers can compute more digits. But maybe someone will at some point program a computer to come up with faster algorithms on their own. And we could imagine a network of computers somewhere, doing nothing but computing more digits of pi. But that would just be a pointless waste of resources, kinda like bitcoin but without the political angle.

I guess in the short term there would be motivation to have computers working out more and more string theory, but only because there are influential humans who think it’s worth doing. So in that sense, machines doing string theory is like the old-time building of pyramids and cathedrals, except that the cost is in material resources rather than human labor. It’s kind of amusing to think of the endgame of this sort of science as being its production purely for its own sake. A robot G. H. Hardy would be pleased.

Read this: it’s about importance sampling!

Importance sampling plays an odd role in statistical computing. It’s an old-fashioned idea and can behave just horribly if applied straight-up—but it keeps arising in different statistics problems.

Aki came up with Pareto-smoothed importance sampling (PSIS) for leave-one-out cross-validation.

We recently revised the PSIS article and Dan Simpson wrote a useful blog post about it the other day. I’m linking to Dan’s post again here because he gave it an obscure title so you might have missed it.

We’ve had a bunch of other ideas during the past few years involving importance sampling, including adaptive proposal distributions, wedge sampling, expectation propagation, and gradient-based marginal optimization, so I hope we can figure out some more things.

Causal inference with time-varying mediators

Adan Becerra writes to Tyler VanderWeele:

I have a question about your paper “Mediation analysis for a survival outcome with time-varying exposures, mediators, and confounders” that I was hoping that you could help my colleague (Julia Ward) and me with. We are currently using Medicare claims data to evaluate the following general mediation among dialysis patients with atrial fibrillation:

Race -> Warfarin prescriptions -> Stroke within 1 year.

where Warfarin prescriptions is a time-varying mediator (using part D claims with number supplied as days) and there are time-dependent confounders. Even though the exposure doesn’t vary over time, this is an extension of Van der Laans time dependent mediation method because yours also includes time dependent confounders. However, I would also like to account for death as a competing risk via a sub-hazard. Am I correct that the G-formula cannot do this? If so, are you aware of any methods that could do this? I found the following paper that implements a marginal structural subdistribution hazard models, but this doesn’t do mediation (at least I don’t think so).

Becerra also cc-ed me, adding:

I recognize that you have stated on the blog before that you are hesitant to use mediation analyses but they are very common in epi/clinical epi but any help would be much appreciated.

I replied that the two arrows in the above diagram have different meanings. The first arrow is a comparison, comparing people of different races. The second arrow is causal, comparing what would happen if people are prescribed Warfarin or not.

To put it another way, the first arrow is a between-person comparison, whereas the second arrow is implicitly a within-person comparison.

I assume they’d also want another causal arrow, going from Warfarin prescription -> taking Warfarin -> Stroke. But maybe they’re assuming that getting the prescription is equivalent to taking the drug in this case. Anyway, it seems to me that prescription of the drug is not a “mediator” but rather is the causal variable (in the diagram) or an instrument (in the more elaborate diagram, where prescription is the instrument and taking the drug is the causal variable).

This sort of thing comes up a lot when someone proposes a method I don’t fully understand. Perhaps because I don’t really understand it, I end up thinking about the problem in a different way.

Becerra responded:

I see your point about the first arrow maybe not being causal. In fact, Tyler and Whitney Robinson wrote a whole paper on the topic:

We also discuss a stronger interpretation of the “effect of race” (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (e.g. skin color), parental physical phenotype, genetic background and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible.

So according to this it seems like there is a way of estimating the causal effect of race. but let’s suppose my exposure wasn’t race just so I can highlight the real issue in this analysis. My concern is that I haven’t been able to find a method thay does mediation analyses with time varying mediators and exposures and confounders for a survival outcome with a sub hazard competing risk a la Fine and Grey. in the dialysis population death is a huge competing risk for stroke.

However, I am no expert in this and I too am afraid I may be missing something which is why I reached out. When I saw Tyler’s original paper I thought it would work but I can’t see how to incorporate the sub hazard.

Is there such a thing as time varying instruments? In this analysis I’m using part D claims in Medicare (so I don’t really know if they took the drug) and patients can go on and off the drugs as well as initiate other drugs (like beta blockers calcium channel blockers etc). I’m really uniterested in warfarin so I’m concerned about time varying confounding due to other drugs.

Doesn’t seem to me like a static yes no instrument would work but I’ve never fit an IV model so what do I know.

I did just see some instrumental variable models with sub hazards so I’ll look there.

And then VanderWeele replied:

Yes, as Andrew noted, if you have “race” as your exposure then this should not be interpreted causally with respect to race. You can (as per the VanderWeele and Robinson, 2014) still interpret the “indirect effect” estimate as e.g. by what portion you would reduce the existing racial disparity if you intervened on warfarin to equalize its distribution in the black population to what it is in the white population, and the “direct effect” as the portion of the disparity that would still remain after that intervention.

We do have a paper on mediation with a survival outcome with a time-varying mediator, but alas it will not handle competing risk and sub-hazards. That would require further methods development.

I’ve never worked on this sort of problem myself. If I did so, I think I’d start by modeling the probability of stroke given drug prescriptions and individual-level background variables including ethnicity, age, sex, previous health status, etc. Maybe with some measurement error model if claims data are imperfect.