Comparing racism from different eras: If only Tucker Carlson had been around in the 1950s he could’ve been a New York Intellectual.

TV commentator Carlson in 2018 recently raised a stir by saying that immigration makes the United States “poorer, and dirtier, and more divided,” which reminded me of this rant from literary critic Alfred Kazin in 1957:

Screen Shot 2013-03-16 at 6.12.03 PM

Kazin put it in his diary and Carlson broadcast it on TV, so not quite the same thing.

But this juxtaposition made me think of Keith Ellis’s comment that “there’s much less difference between conservatives and progressives than most people think. Maybe one or two generations of majority opinion, at most.”

When people situate themselves on political issues, I wonder how much of this is on the absolute scale and how much is relative to current policies or the sense of the prevailing opinion. Is Tucker Carlson more racist than Alfred Kazin? Does this question even make sense? Maybe it’s like comparing baseball players from different eras, e.g. Mike Trout vs. Babe Ruth as hitters. Or, since we’re on the topic of racism, Ty Cobb vs. John Rocker.

The post Comparing racism from different eras: If only Tucker Carlson had been around in the 1950s he could’ve been a New York Intellectual. appeared first on Statistical Modeling, Causal Inference, and Social Science.

Why do sociologists (and bloggers) focus on the negative? 5 possible explanations. (A post in the style of Fabio Rojas)

Fabio Rojas asks why the academic field of sociology seems so focused on the negative. As he puts it, why doesn’t the semester begin with the statement, “Hi, everyone, this is soc 101, the scientific study of society. In this class, I’ll tell you about how American society is moving in some great directions as well as some lingering problems”?

Rojas writes:

If sociology is truly a broad social science, and not just the study “social problems,” then we might encourage more research into the undeniably positive improvements in human well being.

This suggestion interests me, in part because on this blog we are often negative. We sometimes write about cool new methods or findings in statistical modeling, causal inference, and social science, but we also spend a lot of time on the negative. And it’s not just us; it’s my impression that blogs in general have a lot of negativity, in the same way that movie reviews are often negative. Even if a reviewer likes a movie, he or she will often take some space to point out possible areas of improvements. And many of the most-remembered reviews are slams.

Rather than getting into a discussion of whether blogs, or academic sociology, or movie reviews, should be more positive or negative, let’s get into the more interesting question of Why.

Why is negativity such a standard response? Let me try to answer in Rojas style:

1. Division of labor. Within social science, sociology’s “job” is to confront us with the bad news, to push us to study inconvenient truths. If you want to hear good news, you can go listen to the economists. Similarly, blogs took the “job” of criticizing the mainstream media (and, later, the scientific establishment); it was a niche that needed filling. If you want to be a sociologist or blogger and focus on the good things, that’s fine, but you’ll be atypical. Explanation 1 suggests that sociologists (and bloggers, and movie reviewers) have adapted to their niches in the intellectual ecosystem, and that each field has the choice of continuing to specialize or to broaden by trying to occupy some of the “positivity” space occupied by other institutions.

2. Efficient allocation of resources. Where can we do the most good? Reporting positive news is fine, but we can do more good by focusing on areas of improvement. I think this is somewhat true, but not always. Yes, it’s good to point out where people can do better, but we can also do good by understanding how good things happen. This is related to the division-of-labor idea above, or it could be considered an example of comparative advantage.

3. Status. Sociology doesn’t have the prestige of economics (more generally, social science doesn’t have the prestige of the natural sciences); blogs have only a fraction of the audience of the mass media (and we get paid even less for blogging then they get paid for their writing); and movie reviewers, of course, are nothing but parasites on the movie industry. So maybe we are negative for emotional reasons—to kick back at our social superiors—or for strategic reasons, to justify our existence. Either way, these are actions of insecure people in the middle, trying to tear down the social structure and replace it with a new one where they’re at the top. This is kind of harsh and it can’t fully be true—how, for example, would it explain that even the sociologists who are tenured professors at top universities still (presumably) focus on the bad news, or that even star movie reviewers can be negative—but maybe it’s part of the way that roles and expectations are established and maintained.

4. Urgency. Psychiatrists work with generally-healthy people as well as the severely mentally ill. But caring for the sickest is the most urgent: these are people who are living miserable lives, or who pose danger to themselves and others. Similarly (if on a lesser scale of importance), we as social scientists might feel that progress will continue on its own, while there’s no time to wait to fix serious social ills. Similarly, as a blogger, I might not bother saying much about a news article that was well reported, because the article itself did a good job of sending its message. But it might seem more urgent to correct an error. Again, this is not always good reasoning—it could be that understanding a positive trend and keeping it going is more urgent than alerting people to a problem—but I think this may be one reason for a seeming focus on negativity. As Auden put it,

To-morrow, perhaps the future. The research on fatigue
And the movements of packers; the gradual exploring of all the
Octaves of radiation;
To-morrow the enlarging of consciousness by diet and breathing.

To-morrow the rediscovery of romantic love,
the photographing of ravens; all the fun under
Liberty’s masterful shadow;
To-morrow the hour of the pageant-master and the musician,

The beautiful roar of the chorus under the dome;
To-morrow the exchanging of tips on the breeding of terriers,
The eager election of chairmen
By the sudden forest of hands. But to-day the struggle.

5. Man bites dog. Failures are just more interesting to write about, and to read about, than successes. We’d rather hear the story of “secrets and lies in a Silicon Valley startup,” than hear the boring story of a medical device built by experienced engineers and sold at a reasonable price. Hence the popularity within social science (not just sociology!) of stories of the form, Everything looks like X but not Y; the popularity among bloggers of Emperor’s New Clothes narratives; and the popularity among movie reviewers of, This big movie isn’t all that. You will occasionally get it the other way—This seemingly bad thing is really good—but it’s generally in the nature of contrarian takes to be negative, because they’re reacting to some previous positive message coming from public relations and the news media.

Finally, some potential explanations that I don’t think really work: Laziness. Maybe it’s less effort to pick out things to complain about then to point out good news. I don’t think so. When it comes to society, as Rojas notes in his post, there are lots of positive trends to point out. Similar, science is full of interesting papers—open up just about any journal and look for the best, most interesting ideas—and there are lots of good movies too. Rewards. You get more credit, pay, and glory for being negative than positive. Again, I don’t think so. Sure, there are the occasional examples such as H. L. Mencken, but I think the smoother path to career success is to say positive things. Pauline Kael, for example, had some memorable pans but I’d say her characteristic stance was enthusiasm. For every Thomas Frank there are three Malcolm Gladwells (or so I say based on my unscientific guess), and it’s the Gladwells who get more of the fame and fortune. Personality. Sociologists, bloggers, and reviewers are, by and large, malcontents. They grumble about things cos that’s what they do, and whiny people are more likely to gravitate to these activities. OK, maybe so, but this doesn’t really explain why negativity is concentrated in these fields and media rather than others. The “personality” explanation just takes us back to our first explanation, “division of labor.”

And, yes, I see the irony that this post, which is all about why sociologists and bloggers are so negative, has been sparked by a negative remark made by a sociologist on a blog. And I’m sure you will have some negative things to say in the comments. After all, the only people more negative than bloggers, are blog commenters!

The post Why do sociologists (and bloggers) focus on the negative? 5 possible explanations. (A post in the style of Fabio Rojas) appeared first on Statistical Modeling, Causal Inference, and Social Science.

Surprise-hacking: “the narrative of blindness and illusion sells, and therefore continues to be the central thesis of popular books written by psychologists and cognitive scientists”

Teppo Felin sends along this article with Mia Felin, Joachim Krueger, and Jan Koenderink on “surprise-hacking,” and writes:

We essentially see surprise-hacking as the upstream, theoretical cousin of p-hacking. Though, surprise-hacking can’t be resolved with replication, more data or preregistration. We use perception and priming research to make these points (linking to Kahneman and priming, Simons and Chabris’s famous gorilla study and its interpretation, etc).

We think surprise-hacking implicates theoretical issues that haven’t meaningfully been touched on – at least in the limited literatures that we are aware of (mostly in cog sci, econ, psych). Though, there are probably related literatures out there (which you are very likely to know) – so I’m curious if you are aware of papers in other domains that deal with this or related issues?

I think the point that Felin et al. are making is that results obtained under conditions of surprise might not generalize to normal conditions. The surprise in the experiment is typically thought of as a mechanism for isolating some phenomenon—part of the design of the experiment—but arguably is it one of the conditions of the experiment as well. Thus, the conclusion of a study conducted under surprise should not be, “People show behavior X,” but rather, “People show behavior X under a condition of surprise.”

Regarding Felin’s question to me: I am not aware of any discussion of this issue in the political science literature, but maybe there’s something out there, or perhaps something related? All I can think of right now is experiments on public opinion and voting, where there is some discussion of relevance of isolated experiments to real-world behavior when people are subject to many influences.

I’ll conclude with a line from Felin et al.’s paper:

The narrative of blindness and illusion sells, and therefore continues to be the central thesis of popular books written by psychologists and cognitive scientists.

I’m reminded of the two modes of reasoning in pop-microeconomics: (1) People are rational and respond to incentives. Behavior that looks irrational is actually completely rational once you think like an economist, or (2) People are irrational and they need economists, with their open minds, to show them how to be rational and efficient.

They get you coming and going, and the common thread is that they know best. The message is that we are all foolish fools and we need the experts’ expertise for life-hacks that will change our lives.

If we step back a bit further, we can associate this with a general approach to social science, or science in general, which is to focus on “puzzles” or anomalies to our existing theories. From a Popperian/Lakatosian perspective, it makes sense to gnaw on puzzles and to study the counterintuitive. The point, though, is that the blindness and illusion is a property of researchers—after all, the point is to investigate phenomena that don’t fit with our scientific models of the world—as of the people being studied. It’s not so much that people are predictably irrational, but that existing scientific theories are wrong in some predictable ways.

The post Surprise-hacking: “the narrative of blindness and illusion sells, and therefore continues to be the central thesis of popular books written by psychologists and cognitive scientists” appeared first on Statistical Modeling, Causal Inference, and Social Science.

“My advisor and I disagree on how we should carry out repeated cross-validation. We would love to have a third expert opinion…”

Youyou Wu writes:

I’m a postdoc studying scientific reproducibility. I have a machine learning question that I desperately need your help with. My advisor and I disagree on how we should carry out repeated cross-validation. We would love to have a third expert opinion…

I’m trying to predict whether a study can be successfully replicated (DV), from the texts in the original published article. Our hypothesis is that language contains useful signals in distinguishing reproducible findings from irreproducible ones. The nuances might be blind to human eyes, but can be detected by machine algorithms.

The protocol is illustrated in the following diagram to demonstrate the flow of cross-validation. We conducted a repeated three-fold cross-validation on the data.

STEP 1) Train a doc2vec model on the training data (2/3 of the data) to convert raw texts into vectors representing language features (this algorithm is non-deterministic, the models and the outputs can be different even with the same input and parameter)
STEP 2) Infer vectors using the doc2vec model for both training and test sets
STEP 3) Train a logistic regression using the training set
STEP 4) Apply the logistic regression to the test set, generate a predicted probability of success

Because doc2vec is not deterministic, and we have a small training sample, we came up with two choices of strategies:

(1) All studies were first divided into three subsamples A, B, and C. Step 1 through 4 was done once with sample A as the test set, and a combined sample of B and C as the training set, generating on predicted probability for each study in sample A. To generate probabilities for the entire sample, Step 1 through 4 was repeated two more times, setting sample B or C as the test set respectively. At this moment, we had one predicted probability for each study. Subsequently, the entire sample was shuffled to create a different random three-fold partition, followed by same three-fold cross-validation. A new probability was generated for each study this time. The whole procedure was iterated 100 times, so each study had 100 different probabilities. We averaged the probabilities and compared the average probabilities with the ground truth to generate a single AUC score.

(2) All studies were first divided into three subsamples A, B, and C. Step 1 through 4 was first repeated 100 times with sample A as the test set, and a combined sample of B and C as the training set, generating 100 predicted probabilities for each study in sample A. As I said, these 100 probabilities are different because doc2vec isn’t deterministic. We took the average of these probabilities and treated that as our final estimate for the studies. To generate average probabilities for the entire sample, each group of 100 runs was repeated two more times, setting sample B or C as the test set respectively. An AUC was calculated upon completion, between the ground truth and the average probabilities. Subsequently, the entire sample was shuffled to create a different random three-fold partition, followed by the same 3×100 runs of modeling, generating a new AUC. The whole procedure was iterated on 100 different shuffles, and an AUC score was calculated each time. We ended up having a distribution of 100 AUC scores.

I personally thought strategy two is better because it separates variation in accuracy due to sampling from the non-determinism of doc2vec. My advisor thought strategy one is better because it’s less computationally intensive and produce better results, and doesn’t have obvious flaws.

My first thought is to move away from the idea of declaring a study as being “successfully replicated.” Better to acknowledge the continuity of the results from any study.

Getting to the details of your question on cross-validation: Jeez, this really is complicated. I keep rereading your email over and over again and getting confused each time. So I’ll throw this one out to the commenters. I hope someone can give a useful suggestion . . .

OK, I do have one idea, and that’s to evaluate your two procedures (1) and (2) using fake-data simulation: Start with a known universe, simulate fake data from that universe, then apply procedures (1) and (2) and see if they give much different answers. Loop the entire procedure and see what happens, comparing your cross-validation results to the underlying truth which in this case is assumed known. Fake-data simulation is the brute-force approach to this problem, and perhaps it’s a useful baseline to help understand your problem.

The post “My advisor and I disagree on how we should carry out repeated cross-validation. We would love to have a third expert opinion…” appeared first on Statistical Modeling, Causal Inference, and Social Science.

A couple of thoughts regarding the hot hand fallacy fallacy

For many years we all believed the hot hand was a fallacy. It turns out we were all wrong. Fine. Such reversals happen.

Anyway, now that we know the score, we can reflect on some of the cognitive biases that led us to stick with the “hot hand fallacy” story for so long.

Jason Collins writes:

Apart from the fact that this statistical bias slipped past everyone’s attention for close to thirty years, I [Collins] find this result extraordinarily interesting for another reason. We have a body of research that suggests that even slight cues in the environment can change our actions. Words associated with old people can slow us down. Images of money can make us selfish. And so on. Yet why haven’t these same researchers been asking why a basketball player would not be influenced by their earlier shots – surely a more salient part of the environment than the word “Florida”? The desire to show one bias allowed them to overlook another.

Also I was thinking a bit more about the hot hand, in particular a flaw in the underlying logic of Gilovich etc (and also me, before Miller and Sanjurjo convinced me about the hot hand): The null model is that each player j has a probability p_j of making a given shot, and that p_j is constant for the player (considering only shots of some particular difficulty level). But where does p_j come from? Obviously players improve with practice, with game experience, with coaching, etc. So p_j isn’t really a constant. But if “p” varies among players, and “p” varies over the time scale of years or months for individual players, why shouldn’t “p” vary over shorter time scales too? In what sense is “constant probability” a sensible null model at all?

I can see that “constant probability for any given player during a one-year period” is a better model than “p varies wildly from 0.2 to 0.8 for any player during the game.” But that’s a different story. The more I think about the “there is no hot hand” model, the more I don’t like it as any sort of default.

In any case, it’s good to revisit our thinking about these theories in light of new arguments and new evidence.

The post A couple of thoughts regarding the hot hand fallacy fallacy appeared first on Statistical Modeling, Causal Inference, and Social Science.

Oh, I hate it when work is criticized (or, in this case, fails in attempted replications) and then the original researchers don’t even consider the possibility that maybe in their original work they were inadvertently just finding patterns in noise.

I have a sad story for you today.

Jason Collins tells it:

In The (Honest) Truth About Dishonesty, Dan Ariely describes an experiment to determine how much people cheat . . . The question then becomes how to reduce cheating. Ariely describes one idea:

We took a group of 450 participants and split them into two groups. We asked half of them to try to recall the Ten Commandments and then tempted them to cheat on our matrix task. We asked the other half to try to recall ten books they had read in high school before setting them loose on the matrices and the opportunity to cheat. Among the group who recalled the ten books, we saw the typical widespread but moderate cheating. On the other hand, in the group that was asked to recall the Ten Commandments, we observed no cheating whatsoever.

Sounds pretty impressive! But these things all sound impressive when described at some distance from the data.

Anyway, Collins continues:

This experiment has now been subject to a multi-lab replication by Verschuere and friends. The abstract of the paper:

. . . Mazar, Amir, and Ariely (2008; Experiment 1) gave participants an opportunity and incentive to cheat on a problem-solving task. Prior to that task, participants either recalled the 10 Commandments (a moral reminder) or recalled 10 books they had read in high school (a neutral task). Consistent with the self-concept maintenance theory . . . moral reminders reduced cheating. The Mazar et al. (2008) paper is among the most cited papers in deception research, but it has not been replicated directly. This Registered Replication Report describes the aggregated result of 25 direct replications (total n = 5786), all of which followed the same pre-registered protocol. . . .

And what happened? It’s in the graph above (from Verschuere et al., via Collins). The average estimated effect was tiny, it was not conventionally “statistically significant” (that is, the 95% interval included zero), and it “was numerically in the opposite direction of the original study.”

As is typically the case, I’m not gonna stand here and say I think the treatment had no effect. Rather, I’m guessing it has an effect which is sometimes positive and sometimes negative; it will depend on person and situation. There doesn’t seem to be any large and consistent effect, that’s for sure. Which maybe shouldn’t surprise us. After all, if the original finding was truly a surprise, then we should be able to return to our original state of mind, when we did not expect this very small intervention to have such a large and consistent effect.

I promised you a sad story. But, so far, this is just one more story of a hyped claim that didn’t stand up to the rigors of science. And I can’t hold it against the researchers that they hyped it: if the claim had held up, it would’ve been an interesting and perhaps important finding, well worth hyping.

No, the sad part comes next.

Collins reports:

Multi-lab experiments like this are fantastic. There’s little ambiguity about the result.

That said, there is a response by Amir, Mazar and Ariely. Lots of fluff about context. No suggestion of “maybe there’s nothing here”.

You can read the response and judge for yourself. I think Collins’s report is accurate, and that’s what made me sad. These people care enough about this topic to conduct a study, write it up in a research article and then in a book—but they don’t seem to care enough to seriously entertain the possibility they were mistaken. It saddens me. Really, what’s the point of doing all this work if you’re not going to be open to learning?

And there’s no need to think anything done in the first study was unethical at the time. Remember Clarke’s Law.

Another way of putting it is: Ariely’s book is called “The Honest Truth . . .” I assume Ariely was honest when writing this book; that is, he was expressing sincerely-held views. But honesty (and even transparency) are not enough. Honesty and transparency supply the conditions under which we can do good science, but we still need to perform good measurements and study consistent effects. The above-discussed study failed in part because of the old, old problem that they were using a between-person design to study within-person effects; see here and here. (See also this discussion from Thomas Lumley on a related issue.)

P.S. Collins links to the original article by Mazar, Amir, and Ariely. I guess that if I’d read it in 2008 when it appeared, I’d’ve believed all its claims too. A quick scan shows no obvious problems with the data or analyses. But there can be lots of forking paths and unwittingly opportunistic behavior in data processing and analysis; recall the 50 Shades of Gray paper (in which the researchers performed their own replication and learned that their original finding was not real) and its funhouse parody 64 Shades of Gray paper, whose authors appeared to take their data-driven hypothesizing all too seriously. The point is: it can look good, but don’t trust yourself; do the damn replication.

P.P.S. This link also includes some discussions, including this from Scott Rick and George Loewenstein:

In our opinion, the main limitation of Mazar, Amir, and Ariely’s article is not in the perspective it presents but rather in what it leaves out. Although it is important to understand the psychology of rationalization, the other factor that Mazar, Amir, and Ariely recognize but then largely ignore—namely, the motivation to behave dishonestly—is arguably the more important side of the dishonesty equation. . . .

A closer examination of many of the acts of dishonesty in the real world reveals a striking pattern: Many, if not most, appear to be motivated by the desire to avoid (or recoup) losses rather than the simple desire for gain. . . .

The feeling of being in a hole not only originates from nonshareable unethical behavior but also can arise, more prosaically, from overly ambitious goals . . . Academia is a domain in which reference points are particularly likely to be defined in terms of the attainments of others. Academia is becoming increasingly competitive . . . With standards ratcheting upward, there is a kind of “arms race” in which academics at all levels must produce more to achieve the same career gains. . . .

An unfortunate implication of hypermotivation is that as competition within a domain increases, dishonesty also tends to increase in response. Goodstein (1996) feared as much over a decade ago:

. . . What had always previously been a purely intellectual competition has now become an intense competition for scarce resources. This change, which is permanent and irreversible, is likely to have an undesirable effect in the long run on ethical behavior among scientists. Instances of scientific fraud are almost sure to become more common.

Rick and Loewenstein were ahead of their time to be talking about all that, back in 2008. Also this:

The economist Andrei Shleifer (2004) explicitly argues against our perspective in an article titled “Does Competition Destroy Ethical Behavior?” Although he endorses the premise that competitive situations are more likely to elicit unethical behavior, and indeed offers several examples other than those provided here, he argues against a psychological perspective and instead attempts to show that “conduct described as unethical and blamed on ‘greed’ is sometimes a consequence of market competition” . . .

Shleifer (2004) concludes optimistically, arguing that competition will lead to economic growth and that wealth tends to promote high ethical standards. . . .

Wait—Andrei Shleifer—wasn’t he involved in some scandal? Oh yeah:

During the early 1990s, Andrei Shleifer headed a Harvard project under the auspices of the Harvard Institute for International Development (HIID) that invested U.S. government funds in the development of Russia’s economy. Schleifer was also a direct advisor to Anatoly Chubais, then vice-premier of Russia . . . In 1997, the U.S. Agency for International Development (USAID) canceled most of its funding for the Harvard project after investigations showed that top HIID officials Andre Shleifer and Jonathan Hay had used their positions and insider information to profit from investments in the Russian securities markets. . . . In August 2005, Harvard University, Shleifer and the Department of Justice reached an agreement under which the university paid $26.5 million to settle the five-year-old lawsuit. Shleifer was also responsible for paying $2 million worth of damages, though he did not admit any wrongdoing.

In the above quote, Shleifer refers to “conduct described as unethical” and puts “greed” in scare quotes. No way Shleifer could’ve been motivated by greed, right? After all, he was already rich, and rich people are never greedy, or so I’ve heard.

Anyway, that last bit is off topic; still, it’s interesting to see all these connections. Cheating’s an interesting topic, even though (or especially because) it doesn’t seem that it can be be turned on and off using simple behavioral interventions.

The post Oh, I hate it when work is criticized (or, in this case, fails in attempted replications) and then the original researchers don’t even consider the possibility that maybe in their original work they were inadvertently just finding patterns in noise. appeared first on Statistical Modeling, Causal Inference, and Social Science.

Time series of Democratic/Republican vote share in House elections

Yair prepared this graph of average district vote (imputing open seats at 75%/25%; see here for further discussion of this issue) for each House election year since 1976:

Decades of Democratic dominance persisted through 1992; since then the two parties have been about even.

As has been widely reported, a mixture of geographic factors and gerrymandering have given Republicans the edge in House seats in recent years (most notably in 2012 where they retained control even after losing the national vote), but if you look at aggregate votes it’s been a pretty even split.

The above graph also shows that the swing in 2018 was pretty big: not as large as the historic swings in 1994 and 2010, but about the same as the Democratic gains in 2006 and larger than any other swing in the past forty years.

See here and here for more on what happened in 2018

The post Time series of Democratic/Republican vote share in House elections appeared first on Statistical Modeling, Causal Inference, and Social Science.

Prior distributions for covariance matrices

Someone sent me a question regarding the inverse-Wishart prior distribution for covariance matrix, as it is the default in some software he was using. Inverse-Wishart does not make sense for prior distribution; it has problems because the shape and scale are tangled. See this paper, “Visualizing Distributions of Covariance Matrices,” by Tomoki Tokuda, Ben Goodrich, Iven Van Mechelen, Francis Tuerlinckx and myself. Right now I’d use the LKJ family. In Stan there are lots of options. See also our wiki on prior distributions.

The post Prior distributions for covariance matrices appeared first on Statistical Modeling, Causal Inference, and Social Science.

Should we be concerned about MRP estimates being used in later analyses? Maybe. I recommend checking using fake-data simulation.

Someone sent in a question (see below). I asked if I could post the question and my reply on blog, and the person responded:

Absolutely, but please withhold my name because this is becoming a touchy issue within my department.

The boldface was in the original.

I get this a lot. There seems to be a lot of fear out there when it comes to questioning established procedures.

Anyway, here’s the question that the person sent in:

CDC has recently been using your multilevel estimation with post-stratification method to produce county, city, and census tract-level disease prevalence estimates (see The data source is the annual phone-based Behavioral Risk Factor Surveillance System (n=450k). CDC is not transparent about covariates included in the models used to construct the estimates, but as I understand it they are mostly driven by national individual-level associations between sociodemographic factors and disease prevalence. Presumably, the random effects would not influence a unit’s estimated prevalence much if the sample size from that unit is small (as is true for most cities/counties, and for many census tracts the sample size is zero).

I am wondering if you are as troubled as I am by how these estimates are being used. First, websites like County Health Rankings and City Health Dashboard are providing these estimates to the public without any disclaimer that these are not actually random samples of cities/counties/tracts and may not reflect reality. Second, and more problematically, researchers are starting to conduct ecologic studies that analyze the association, for example, between census tract socioeconomic composition and obesity prevalence (It seems quite likely that the study is actually just identifying the individual-level association between income and obesity used to produce the estimates).

I’ve now become involved in a couple of projects that are trying to analyze these estimates so it seems as though their use will increase over time. The only disclaimer that CDC provides is that the estimates shouldn’t be used to evaluate policy.

Are you more confident about the use of these estimates than I am? I am also wondering if CDC should be more explicit in disclosing their limitations to prevent misuse.

My reply:

Wow, N = 450K. That’s quite a survey. (I know my correspondent called it “n,” but when it’s this big, I think the capital letter is warranted.) And here’s the page where they mention Mister P! And they have a web interface.

I’m not quite sure why you say the website provides the estimate “without any disclaimer.” Here’s one of the displays:

It’s not the prettiest graph in the world—I’ll grant you that—but it’s clearly labeled “Model-based estimates” right at the top.

I agree with you, though, in your concern that if these model-based estimates are being used in later analyses, there’s a risk of reification, in which county or city-level predictors that are used in the model can look automatically like good predictors of the outcomes. I’d guess this would be more of a concern with rare conditions than with something like coronary heart disease where the sample size will be (unfortunately) so large.

The right thing to do next, I think, is some fake-data simulation to see how much this should be a concern. CDC has already done some checking (from their methodology page, “CDC’s internal and external validation studies confirm the strong consistency between MRP model-based SAEs and direct BRFSS survey estimates at both state and county levels.”) and I guess you could do more.

Overall, I’m positively inclined toward these MRP estimates because I’d guess it’s much better than the alternatives such as raw or weighted local averages or some sort of postprocessed analysis of weighted averages. I think those approaches would have lots more problems.

In any case, it’s cool to see my method being used by people who’ve never met me! Mister P is all grown up.

P.S. My correspondent provides further background:

The CDC generates prevalence estimates for various diseases at the county level (or smaller) by applying MRP to the national Behavioral Risk Factor Surveillance System. Unlike for other diseases, they’ve documented their methods for diabetes. Their model defines 12 population strata per county (2 races x 2 genders x 3 age groups) and incorporates random effects for stratum, county, and state. There are no other variables at any level in the model.

A number of papers use the MRP-derived data to estimate associations between, for example, PM2.5 and diabetes prevalence. Do you think this is a valid approach? Would it be valid if all of the MRP covariates are included in the model?

My response:

1. Regarding the MRP model, it is what it is. Including more demographic factors is better, but adjusting for these 12 cells per county is better than not adjusting, I’d think. One thing I do recommend is to use group-level predictors. In this case, the group is county, and lots of county-level predictors will be available that will be relevant for predicting health outcomes.

2. Regarding the postprocessing using the MRP estimates: Sure, it should be better to fold the two models together, but the two-stage approach (first use MRP to estimate prevalences, then fit another model) could work ok too, with some loss of efficiency. Again, I’d recommend using fake-data simulation to estimate the statistical properties of this approach for the problem at hand.

The post Should we be concerned about MRP estimates being used in later analyses? Maybe. I recommend checking using fake-data simulation. appeared first on Statistical Modeling, Causal Inference, and Social Science.

My footnote about global warming

At the beginning of my article, How to think scientifically about scientists’ proposals for fixing science, which we discussed yesterday, I wrote:

Science is in crisis. Any doubt about this status has surely been been dispelled by the loud assurances to the contrary by various authority figures who are deeply invested in the current system . . . When leaders go to that much trouble to insist there is no problem, it’s only natural for outsiders to worry.

And at that point came a footnote, which I want to share with you here:

At this point a savvy critic might point to global-warming denialism and HIV/AIDS denialism as examples where the scientific consensus is to be trusted and where the dissidents are the crazies and the hacks. Without commenting on the specifics of these fields, I will just point out that the research leaders in those areas are not declaring a lack of crisis—far from it!—nor are they shilling for their “patterns of discovery.” Rather, the leaders in these fields have been raising the alarm for decades and have been actively pointing out inconsistencies in their theories and gaps in their understanding. Thus, I do not think that my recommendation to watch out when the experts tell you to calm down, implies blanket support for dissidents in all areas of science. One’s attitude toward dissidents should depend a bit on the openness to inquiry of the establishments from which they are dissenting.

The post My footnote about global warming appeared first on Statistical Modeling, Causal Inference, and Social Science.