Latour Sokal NYT

Alan Sokal writes:

I don’t know whether you saw the NYT Magazine’s fawning profile of
sociologist of science Bruno Latour about a month ago.

I wrote to the author, and later to the editor, to critique the gross lack of balance (and even of the most minimal fact-checking). No reply. So I posted my critique on my webpage.

From that linked page from Sokal:

The basic trouble with much of Latour’s writings—as with those of some other sociologists and philosophers of a “social constructivist” bent—is that (as Jean Bricmont and I [Sokal] pointed out already in 1997)

these texts are often ambiguous and can be read in at least two distinct ways: a “moderate” reading, which leads to claims that are either worth discussing or else true but trivial; and a “radical” reading, which leads to claims that are surprising but false. Unfortunately, the radical interpretation is often taken not only as the “correct” interpretation of the original text but also as a well-established fact (“X has shown that …”) . . .

numerous ambiguous texts that can be interpreted in two different ways: as an assertion that is true but relatively banal, or as one that is radical but manifestly false. And we cannot help thinking that, in many cases, these ambiguities are deliberate. Indeed, they offer a great advantage in intellectual battles: the radical interpretation can serve to attract relatively inexperienced listeners or readers; and if the absurdity of this version is exposed, the author can always defend himself by claiming to have been misunderstood, and retreat to the innocuous interpretation.

Sokal offers a specific example.

First, he quotes the NYT reporter who wrote:

When [Latour] presented his early findings at the first meeting of the newly established Society for Social Studies of Science, in 1976, many of his colleagues were taken aback by a series of black-and-white photographic slides depicting scientists on the job, as though they were chimpanzees. It was felt that scientists were the only ones who could speak with authority on behalf of science; there was something blasphemous about subjecting the discipline, supposedly the apex of modern society, to the kind of cold scrutiny that anthropologists traditionally reserved for “premodern” peoples.

Sokal responds:

In reality, it beggars belief to imagine that sociologists of science—whose entire raison d’être is precisely to subject the social practice of science to “cold scrutiny”—could possibly think that “scientists were the only ones who could speak with authority on behalf of science”. Did you bother to seek confirmation of this self-serving claim from anyone present at that 1976 meeting, other than Latour himself?

Sokal continues in his letter to the NYT reporter:

In the same way, you faithfully reproduce Latour’s ambiguities concerning the notion of “fact”:

It had long been taken for granted, for example, that scientific facts and entities, like cells and quarks and prions, existed “out there” in the world before they were discovered by scientists. Latour turned this notion on its head. In a series of controversial books in the 1970s and 1980s, he argued that scientific facts should instead be seen as a product of scientific inquiry. …

In your article you take for granted that Latour’s view is correct: indeed, a few paragraphs later you say that Latour showed “that scientific facts are the product of all-too-human procedures”. But, like Latour, you never explain in what sense the traditional view—that cells and quarks and prions existed “out there” in the world before they were discovered by scientists—is mistaken.

I’m with Sokal: Scientific facts are real. Their discovery, expression, and (all too often) misrepresentation are the product of human procedures, but the facts and entities exist.

As Sokal discusses, the whole thing is slippery, as can be seen even in the brief discussion excerpted above. If you give Latour’s statements a minimalist interpretation—the concepts of “cells,” “quarks,” etc. are human-constructed—there’s really no problem. Yes, the phenomena described by our concepts of cells, quarks, etc. are real and would exist even if humans had never appeared on the Earth, but one could imagine completely different ways of expressing and formulating models for these scientific facts, in forms that might look nothing like “cells” and “quarks.” Just as one can, for example, express classical mechanics with or without the concept of “force.”

And, of course, if you want to go further, there’s lots of apparent scientific facts that, it seems, are simply human-created mistakes: I’m thinking here of examples such as recent studies of ESP, himmicanes, air rage, beauty and sex ratio, etc.

So Latour’s general perspective is valuable. But Sokal argues, convincingly to me, that much of the reading of Latour, including in that news article, takes the strong view, what might be called the postmodern view, which throws the baby of replicable science out with the bathwater of contingent theories.

Sokal writes:

If Latour had really shown that scientific facts are the product of all-too-human procedures, then the critics’ charge would be unfair. But in reality Latour had not shown anything of the sort; he had simply asserted it, and many others (not cited by you) had criticized those assertions. Of course, it goes without saying that scientists’ beliefs (and assertions of alleged fact) about the external world are the product of all-too-human procedures — that is true and utterly banal. But Latour’s claims are nothing more than deliberate confusion between two senses of the word “fact” (namely, the usual one and his own idiosyncratic one). . . . muddying the distinction between facts and assertions of fact undermines our ability to think clearly about this crucial psychological/sociological/political problem.

Sokal continues with his correspondence with the New York Times (they eventually replied after he sent them several emails).

Just to be clear here, I don’t think there are any villains in this story.

Latour has a goofy view of science, and I agree with Sokal that his (Latour’s) expressions of his ideas are a bit slippery—but, hey, Latour entitled to express his views, and you gotta give him credit for being influential. Latour’s successes must in some part be a consequence of previous gaps or at least underemphasized points in discussions of science.

The author of the NYT article, Ava Kofman, found a good story and ran with it. I agree with Sokal that she missed the point—or, to put it another way, that she might well be doing a good job telling the story of Latour, she’s not doing a good job telling the story of Latour’s ideas. But, that’s not quite her job: even if, as the saying goes, Latour’s work “contains much that is original and much that is correct; unfortunately that which is correct is not original, and that which is original is not correct,” Kofman is not really writing about this; she’s writing more about Latour’s influence.

The ironic thing, though, is that Kofman’s article is following the standard template of feature stories about a scientist or academic, which is to treat him as a hero. If there’s one idea that Latour stands for, it’s that scientists are part of a social process, and it misses the point to routinely treat them as misunderstood geniuses.

Anyway, although I share Sokal’s annoyance that the author of an article on Latour missed key aspects of Latour’s ideas and then didn’t even reply to his thoughtful criticism, I can understand why the reporter wants to move on to her next project. In my experience, journalists are more forward-looking than academics: we worry about our past errors, they just move on. It’s a different style, perhaps deriving from the difference between traditional publication in bound volumes and publication in fishwrap.

Finally, perhaps there’s not much the NYT editors can do at this point. Newspapers, and for that matter scientific journals, rarely run corrections even of clear factual errors—at least, that’s been my experience. So I can’t blame them too much for following common practice.

Ultimately, this all comes down to questions of emphasis and interpretation. Latour has, for better or worse, expressed ideas that have been influential in the sociology of science; his story is interesting and worth a magazine article; writing a story with Latour as hero leads to some confusion about what is understood by others in that field. In that sense it’s not so different from a story in the sports or business pages that presents a contest from one side. That’s a journalistic convention, and that’s fine, and it’s also fine for someone such as Sokal who has a different perspective (one that I happen to agree with) to share that too.

As Sokal puts it:

The ironic thing is that Latour has spent his life decrying (and rightly so) the scientist-as-hero approach to the presenting science to the general public; but here is an article that takes an extreme version of the same approach, albeit applied to a sociologist/philosopher rather than a scientist.

A newspaper or magazine article about a thinker should not merely be a fawning and uncritical celebration of his brilliance; it should also discuss his ideas. Indeed, this article does purport to explain and discuss Latour’s ideas, not just his personal story; but it does so in a completely uncritical way, not even letting on that there might be people who have cogent critiques of his ideas. That, it seems to me, is a gross failure of balance—and more importantly, a gross abdication of the newspaper’s mission to inform its readers about important subjects. (In this case, a subject that has serious real-world consequences.) Not to mention the gross lack of elementary fact-checking that I pointed out.

Of course, one could also question whether the “hero” mode of writing is appropriate even on the sports or business pages. This mode of writing presents a contest from one side only; and it is not very often the case in sports or business that there is in fact only one side.

So, yeah, the NYT article was not so bad as feature articles go—it told an engaging story from one particular perspective—but there was an opportunity to do better. Hence Sokal’s post, and this post linking to it.

P.S. Hey, the name Bruno Latour rings a bell . . . Unfortunately, he didn’t make it out of the first round of our seminar speaker competition.

The post Latour Sokal NYT appeared first on Statistical Modeling, Causal Inference, and Social Science.

A parable regarding changing standards on the presentation of statistical evidence

Now, the P-value Sneetches
Had tables with stars.
The Bayesian Sneetches
Had none upon thars.

Those stars weren’t so big. They were really so small.
You might think such a thing wouldn’t matter at all.

But, because they had stars, all the P-value Sneetches
Would brag, “We’re the best kind of Sneetch on the Beaches.
With their snoots in the air, they would sniff and they’d snort
“We’ll have nothing to do with the Bayesian sort!”
And whenever they met some, when they were out walking,
They’d hike right on past them without even talking.

When the P-value children went out to play ball,
Could a Bayesian get in the game… ? Not at all.
You only could play if your tables had stars
And the Bayesian children had none upon thars.

When the P-value Sneetches had frankfurter roasts
Or picnics or parties or PNAS toasts,
They never invited the Bayesian Sneetches.
They left them out cold, in the dark of the beaches.
They kept them away. Never let them come near.
And that’s how they treated them year after year.

Then ONE day, seems… while the Bayesian Sneetches
Were moping and doping alone on the beaches,
Just sitting there wishing their tables had stars…
A stranger zipped up in the strangest of cars!

“My friends,” he announced in a voice clear and keen,
“My name is Savage McJeffreys McBean.
And I’ve heard of your troubles. I’ve heard you’re unhappy.
But I can fix that. I’m the Fix-it-Up Chappie.
I’ve come here to help you. I have what you need.
And my prices are low. And I work at great speed.
And my work is one hundred per cent guaranteed!

Then, quickly Savage McJeffreys McBean
Put together a Bayes Factor machine.
And he said, “You want stars like a Star-Tabled Sneetch… ?
My friends, you can have them for three dollars each!”

“Just pay me your money and hop right aboard!”
So they clambered inside. Then the big machine roared
And it klonked. And it bonked. And it jerked. And it berked
And it bopped them about. But the thing really worked!
When the Bayesian Sneetches popped out, they had stars!
They actually did. They had stars upon thars!

Then they yelled at the ones who had stars at the start,
“We’re exactly like you! You can’t tell us apart.
We’re all just the same, now, you snooty old smarties!
And now we can go to your NPR parties.”

“Good grief!” groaned the ones who had stars at the first.
“We’re still the best Sneetches and they are the worst.
But, now, how in the world will we know,” they all frowned,
“If which kind is what, or the other way round?”

Then came McBean with a very sly wink.
And he said, “Things are not quite as bad as you think.
So you don’t know who’s who. That is perfectly true.
But come with me, friends. Do you know what I’ll do?
I’ll make you, again, the best Sneetches on beaches
And all it will cost you is ten dollars eaches.”

“P-value stars are no longer in style,” said McBean.
“What you need is a trip through my Replication Machine.
This wondrous contraption will take off your stars
So you won’t look like Sneetches who have them on thars.”
And that handy machine
Working very precisely
Removed all the stars from their tables quite nicely.

Then, with snoots in the air, they paraded about
And they opened their beaks and they let out a shout,
“We know who is who! Now there isn’t a doubt.
The best kind of Sneetches are Sneetches without!”

Then, of course, those with stars all got frightfully mad.
To be wearing a star now was frightfully bad.
Then, of course, old Savage McJeffreys McBean
Invited them into his Star-Off machine.

Then, of course from THEN on, as you probably guess,
Things really got into a horrible mess.
All the rest of that day, on those wild screaming beaches,
The fix-it-up Chappie kept fixing up Sneetches.
Off again! On Again!
In again! Out again!
Through the machines they raced round and about again,
Changing their stars every minute or two.
They kept paying money. They kept running through
Until neither the Plain nor the Star-Tables knew
Whether this one was that one… or that one was this one
Or which one was what one… or what one was who.

Then, when every last cent
Of their money was spent,
The Fix-it-Up Chappie packed up
And he went.

And he laughed as he drove
In his car up the beach,
“They never will learn.
No. You can’t teach a Sneetch!”

But McBean was quite wrong. I’m quite happy to say
That the Sneetches got really quite smart on that day,
The day they decided that Sneetches are Sneetches
And no kind of Sneetch is the best on the beaches
That day, all the Sneetches forgot about stars
And whether they had one, or not, upon thars.

[Original is on the web, for example here. I was inspired to construct the above adaptation after thinking of the series of public advice I’ve given over the years regarding prior distributions: first we recommended uniform priors, then scaled-inverse-Wishart and Cauchy and half-Cauchy, now LKJ and normal and half-normal and horseshoe, and who knows what in the future. And I used to recommend p-values and now I don’t. It’s hard to keep up . . .]

The post A parable regarding changing standards on the presentation of statistical evidence appeared first on Statistical Modeling, Causal Inference, and Social Science.

Niall Ferguson and the perils of playing to your audience

History professor Niall Ferguson had another case of the sillies.

Back in 2012, in response to Stephen Marche’s suggestion that Ferguson was serving up political hackery because “he has to please corporations and high-net-worth individuals, the people who can pay 50 to 75K to hear him talk,” I wrote:

But I don’t think it’s just about the money. By now, Ferguson must have enough money to buy all the BMWs he could possibly want. To say that Ferguson needs another 50K is like saying that I need to publish in another scientific journal. No, I think what Ferguson is looking for (as am I, in my scholarly domain) is influence. He wants to make a difference. And one thing about being paid $50K is that you can assume that whoever is paying you really wants to hear what you have to say.

The paradox, though, as Marche notes, is that Ferguson gets and keeps the big-money audience is by telling them not what he (Ferguson) wants to say—not by giving them his unique insights and understanding—but rather by telling his audience what they want to hear.

That’s what I called The Paradox of Influence.

But then, a year later, Ferguson went too far, even by his own standards, when during a talk to a bunch of richies he attributed Keynes’s economic views (I don’t actually know exactly what Keyesianism is, but I think a key part is for the government to run surpluses during economic booms and deficits during recessions) to Keynes being gay and marrying a ballerina and talking about poetry. The general idea, I think, is that people without kids don’t care so much about the future, and this motivated Keynes’s party-all-the-time attitude, which might have worked just fine for Eddie Murphy’s girl in the 1980s and in San Francisco bathhouses of the 1970s but, according to Ferguson, is not the ticket for preserving today’s American empire.

My theory on that one is not that Ferguson is a flaming homophobe or a shallow historical determinist (the expression is “piss-poor monocausal social science,” I believe) but rather that he misjudged his audience and threw them some academic frat-boy-style humor that he mistakenly thought they’d enjoy. He served them red meat, but the wrong red meat. Probably would’ve been better for him to have just preached the usual get-the-government-off-our-backs sermon and not tried to get cute by bring up the whole ballerina thing.

Anyway, it happened again! Fergie made a fool of himself, just for trying to make some people happy.

Brian Contreras, Ada Statler, and Courtney Douglas (link from Jeet Heer via Mark Palko) report:

Leaked emails show Hoover academic conspiring with College Republicans to conduct ‘opposition research’ on student . . . “[The original Cardinal Conversations steering committee] should all be allies against O. Whatever your past differences, bury them. Unite against the SJWs. [Christos] Makridis [a fellow at Vox Clara, a Christian student publication] is especially good and will intimidate them,” Ferguson wrote. “Now we turn to the more subtle game of grinding them down on the committee. The price of liberty is eternal vigilance” . . . In the email chain, Ferguson wrote, “Some opposition research on Mr. O might also be worthwhile,” referring to Ocon.
Minshull wrote in response that he would “get on the opposition research for Mr. O.” Minshull is presently Ferguson’s research assistant . . .

It’s hard for me to imagine that Ferguson, globetrotting historian and media personality that he is, would really care so much about “grinding down” some students in a university committee. I’m guessing he was just trying to ingratiate himself with these youngsters, who I guess he views as the up-and-coming new generation of college politicians. Ferguson’s just the modern version of the stock figure, the middle-aged guy trying to talk groovy like the kids. “Some opposition research on Mr. O might also be worthwhile,” indeed. It’s the university-politics version of, ummm, I dunno, building a treehouse with some 12-year-olds, or playing hide-and-seek with a group of 4-year-olds.

The whole thing’s kinda sad in that Fergie seems so clueless. Even in the aftermath, he says, “I very much regret the publication of these emails. I also regret having written them.” Which is fine, but he still doesn’t seem to recognize the absurdity of the situation, a professor in his fifties playing student politics. As with his slurs of Keynes, the man is just a bit too eager to give his audience what he thinks they want to hear.

(pre-2000) academic historian
(2000-2005) propagandist for Anglo-American empire
(2010-2015) TV talking head and paid speaker for rich people
(2018) player in undergraduate campus politics.

At this point, he’s gotta be thinking: Could I have stopped somewhere along the way? Or was the whole trajectory inevitable. It’s a question of virtual history.

The post Niall Ferguson and the perils of playing to your audience appeared first on Statistical Modeling, Causal Inference, and Social Science.

“Statistical insights into public opinion and politics” (my talk for the Columbia Data Science Society this Wed 9pm)

7pm in Fayerweather 310:

Why is it more rational to vote than to answer surveys (but it used to be the other way around)? How does this explain why we should stop overreacting to swings in the polls? How does modern polling work? What are the factors that predict election outcomes? What’s good and bad about political prediction markets? How do we measure political polarization, and what does it imply for our politics? We will discuss these and other issues in American politics and more generally how we can use data science to learn about the social world.

People can read the following articles ahead of time if they would like.

Short:
https://slate.com/news-and-politics/2018/11/midterms-blue-wave-statistics-data-analysis.html
http://www.slate.com/articles/news_and_politics/politics/2016/08/why_trump_clinton_won_t_be_a_landslide.html
https://slate.com/news-and-politics/2016/08/dont-be-fooled-by-clinton-trump-polling-bounces.html
http://www.slate.com/articles/news_and_politics/moneybox/2016/07/why_political_betting_markets_are_failing.html

Longer:
http://www.stat.columbia.edu/~gelman/research/published/what_learned_in_2016_5.pdf
http://www.stat.columbia.edu/~gelman/research/published/swingers.pdf

The post “Statistical insights into public opinion and politics” (my talk for the Columbia Data Science Society this Wed 9pm) appeared first on Statistical Modeling, Causal Inference, and Social Science.

My talk tomorrow (Tues) noon at the Princeton University Psychology Department

Integrating collection, analysis, and interpretation of data in social and behavioral research

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

The replication crisis has made us increasingly aware of the flaws of conventional statistical reasoning based on hypothesis testing. The problem is not just a technical issue with p-values, not can it be solved using preregistration or other purely procedural approaches. Rather, appropriate solutions have three aspects. First, in collecting your data there should be a concordance between theory and measurement: for example, in studying the effect of an intervention applied to individuals, you should measure within-person comparisons. Second, in analyzing your data, you should study all comparisons of potential interest, rather than selecting based on statistical significance or other inherently noisy measures. Third, you should interpret your results in the context of theory, background knowledge, and the data collection and analysis you have performed. We discuss these issues on a theoretical level and with examples in psychology, political science, and policy analysis.

Here are some relevant references:

Some natural solutions to the p-value communication problem—and why they won’t work.

Honesty and transparency are not enough.

The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective.

And this:

No guru, no method, no teacher, Just you and I and nature . . . in the garden. Of forking paths.

The talk will be Tuesday, December 4, 2018, 12:00pm, in A32 Peretsman Scully Hall.

The post My talk tomorrow (Tues) noon at the Princeton University Psychology Department appeared first on Statistical Modeling, Causal Inference, and Social Science.

The p-value is 4.76×10^−264

Jerrod Anderson points us to Table 1 of this paper:

It seems that the null hypothesis that this particular group of men and this particular group of women are random samples from the same population, is false.

Good to know. For a moment there I was worried.

On the plus side, as Anderson notes, the paper includes distributional comparisons:

This is fine as a visualization, but I don’t think there’s much here beyond the means and variances. Seems a lot of space to devote to demonstrating that men, on average, are bigger than women. There’s other stuff in the paper as well, but my favorite is the p-value of 4.76×10^−264. I love that they have all these decimal places. Because 4×10^-264 wouldn’t be precise enuf. That’s even worse—actually, a lot worse—than this example.

The post The p-value is 4.76×10^−264 appeared first on Statistical Modeling, Causal Inference, and Social Science.

Stephen Wolfram explains neural nets

It’s easy to laugh at Stephen Wolfram, and I don’t like some of his business practices, but he’s an excellent writer and is full of interesting ideas. This long introduction to neural network prediction algorithms is an example. I have no idea if Wolfram wrote this book chapter himself or if he hired one of his paid theorem-provers to do it—I guess it’s probably some sort of collaboration—but it doesn’t really matter. It all looks really cool.

The post Stephen Wolfram explains neural nets appeared first on Statistical Modeling, Causal Inference, and Social Science.

“And when you did you weren’t much use, you didn’t even know what a peptide was”

Last year we discussed the story of an article, “Variation in the β-endorphin, oxytocin, and dopamine receptor genes is associated with different dimensions of human sociality,” published in PNAS that, notoriously, misidentified what a peptide was, among other problems.

Recently I learned of a letter published in PNAS by Patrick Jern, Karin Verweij, Fiona Barlow, and Brendan Zietsch, with the no-fooling-around title, “Reported associations between receptor genes and human sociality are explained by methodological errors and do not replicate.”

And here’s the response by one of the authors, Robin Dunbar, entitled “Sorry, we got it wrong” “On asking the right questions.”

Too bad they couldn’t simply admit they made an error, stating clearly and without equivocation that their original conclusions were not substantiated. On the plus side, they weren’t as rude as these authors.

P.S. The other thing in that post was that I suggested to PNAS that they change their slogan from “PNAS publishes only the highest quality scientific research” to “PNAS aims to publish only the highest quality scientific research.” And they did it! So cool.

The post “And when you did you weren’t much use, you didn’t even know what a peptide was” appeared first on Statistical Modeling, Causal Inference, and Social Science.

Multilevel models for multiple comparisons! Varying treatment effects!

Mark White writes:

I have a question regarding using multilevel models for multiple comparisons, per your 2012 paper and many blog posts. I am in a situation where I do randomized experiments, and I have a lot of additional demographic information about people, as well. For the moment, let us just assume that all of these are categorical demographic variables. I want to not only know if there is an effect of the treatment over the control—but for what groups there is an effect (positive or negative) for. I never get too granular, but I do look at an intersection between two variables (e.g., Black men, younger married people, Republican women) as well as just within one variable (e.g., women, Republicans, married people).

The issue I’m running into is that I want to look at the effects for all of these groups, but I don’t want to get mired down by Type I error and go chasing noise. (I know you reject the Type I error paradigm because a null of precisely zero is a straw-man argument, but clients and other stakeholders still want to be sure we aren’t reading too much into something that is not there.)

In the machine learning literature, there is a growing interest in causal inference and now a whole topic called “heterogeneous treatment effects.” In the general linear model world in which I was taught as a psychologist, this could also just be called “looking for interactions.” Many of these methods are promising, but I’m finding them difficult to implement in my scenario (I wrote a question here https://stats.stackexchange.com/questions/341402/a-few-questions-regarding-the-practice-of-heterogeneous-treatment-effect-analysi and posed a tailored question about one package to package creators directly here https://github.com/swager/grf/issues/238).

Turning back to multilevel models, it seems like I could do this in that framework. Basically, I just create a non-nested/crossed/whatever you’d like to call it model where people are nested in k groups, where k refers to how many demographic variables I have. I simulated data and fit a model here: https://gist.github.com/markhwhiteii/592d40f93b052663f240125fc9b8db99

The questions I have for you are the questions I pose at the bottom of that R script at the GitHub code snippet:

1. Is this a reasonable approach to examine “heterogenous treatment effects” without getting bogged down by Type I error and multiple comparison problems?

2. If it is, how can I get confidence intervals from the fitted model object using glmer? You all do so in the 2012 paper, I believe

3. More importantly, how can I look at the intersection between two groups? The code I sent in that GitHub snippet looks at effects for men, women, Blacks, Whites, millennials, etc. But I coded in an effect for Black men specifically. How could I use that fitted model object to examine the effect for Black men, White women, millennials with kids, etc.? And how would I calculate standard errors for these?

4. Would all of these things be easier to do in Stan? What would that Stan model look like? Since then I wouldn’t have to figure out how to calculate standard errors for everything, but just sample from the posterior.

My reply:

We’ve been talking about varying treatment effects for a long time. (“Heterogeneous” is jargon for “varying,” I think.)

From 2004: Treatment effects in before-after data.

From 2008: Estimating incumbency advantage and its variation, as an example of a before/after study.

From 2015: The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective.

From 2015: Hierarchical models for causal effects.

From 2015: The connection between varying treatment effects and the well-known optimism of published research findings.

From 2017: Let’s accept the idea that treatment effects vary—not as something special but just as a matter of course.

I definitely think hierarchical modeling is the way to go here. Think of it as a regression model, in which you’re modeling (predicting) treatment effects given pre-treatment predictors, so the treatment could be more effective for men than for women, or for young people than for old people, etc. You’ll end up with lots of predictors in this regression, and multilevel modeling is a way to control or regularize their coefficients.

In short, the key virtue of multilevel modeling (or some other regularization approach) here is that it allows you to include more predictors in your regression. Without regularization, your estimates would become too noisy, then you’d have to fit a cruder model, not allowing you to study the variation that you care about.

The other thing is, yeah, forget type 1 error rates and all the rest. Abandon the idea that the goal of the statistical analysis is to get some sort of certainty. Instead, accept posterior ambiguity: don’t try to learn more from the data than you really can.

I’ll start with some models in lme4 (or rstanarm) notation. Suppose you have a treatment z and pre-treatment predictors x1 and x2. Then here are some models:

y ~ z + x1 + x2 # constant treatment effect
y ~ z + x1*z + x2*z # treatment can vary by x1 and x2
y ~ z + x1*x2*z # also include interaction of x1 and x2

If you have predictors x3 and x4 with multiple levels:

y ~ z + x1 + x2 + (1 | x3) + (1 | x4) # constant treatment effect
y ~ z + x1*z + x2*z + (1 + z | x3) + (1 + z | x4) # varying treatment effect
y ~ z + x1*z + x2*z + (1 + z | x3*x4) # includes an interaction

One thing we’re still struggling with, is that there are all these possible models. Really we’d like to start and end with the full model, something like this, with all the interactions:

y ~ (1 + x1*x2*z | x3*x4)

But these models can be hard to handle. I think we need stronger priors, stronger than the current defaults in rstanarm. So for now I’d build up from the simple model, including interactions as appropriate.

In any case, you can get posterior uncertainties for whatever you want from stan_glmer() in rstanarm; simulations of all the parameters are directly accessible from the fitted object.

You can also aggregate however you want. It’s mathematically the same as Mister P; you’re just working with treatment effects rather than averages.

The post Multilevel models for multiple comparisons! Varying treatment effects! appeared first on Statistical Modeling, Causal Inference, and Social Science.