Discussion with Nassim Taleb about sexism and racism in the Declaration of Independence

Nassim Taleb points to this post from congressmember Ayanna Pressley linking to an opinion piece by Matthew Rozsa. Rozsa’s article has the title, “Fourth of July’s ugly truth exposed: The Declaration of Independence is sexist, racist, prejudiced,” with subttile, “How we can embrace the underlying spirit of the Declaration of Independence — and also learn from its shortcomings.”

Here’s what Nassim wrote in his post:

This is the very definition of anachronistic bigoteering, a violation of the code of political expression, in Principia Politica.
Soon we will ban every document written between the emergence of writing in Sumer and the Obama presidency as tainted with “prejudice”.

And here’s what I wrote in response to Nassim:

I’m confused by your reaction. I followed the link, and the article by Rozsa didn’t seem anachronistic at all. It seemed very historically grounded, explaining various aspects of the Declaration of Independence based on historical context.

Also I don’t see why your reaction is “Soon we will ban every document…”. Rozsa never suggested banning the Declaration at all! His article is subtitled, “How we can embrace the underlying spirit of the Declaration of Independence — and also learn from its shortcomings.” That seems fair enough, to say that the document is sexist, racist, prejudiced, but that doesn’t make it worthless, it’s just a product of its time.

Here’s the final paragraph of Rosza’s article:

Is any of this intended to suggest that we should not take pride in the Declaration of Independence? Not even remotely: It was — and continues to be — one of the most eloquent and morally moving political documents ever penned. That said, we must also remember that our Founding Fathers were not the living gods that many believe them to be. They were fallible human beings, and some of their flaws had terrible consequences for people who were not fortunate enough to be born into privileged groups. When we celebrate the Declaration of Independence, we should embrace its underlying spirit — as well as the courage of the men who were willing to risk “our lives, our fortunes and our sacred honor” — and simultaneously learn from its shortcomings. This alone can make the spirit of 1776 relevant to the conditions of 2019 — or any other year, for that matter.

This doesn’t seem anything close to the “ban everything” attitude that you wrote about.

Nassim then replied:

No the article itself doesn’t call for banning. If my tweet implied that, it is miscommunication. The article does use accusatory terms: “The Declaration… is sexist, racist, and prejudiced.” My response is not to the article per se but to the general disease of anachronistic bigoteering (that I cover in Principia Politica), which does indeed lead to censorship.

It’s not fair to say that he document is sexist, racist, prejudiced, or to flow back “isms” in time with the negative connotation they convey. That is precisely my point. Moral values were different at the time; they progress just like knowledge progresses. Using “isms” is no different from blaming the ancients for not understanding germs and calling them “obscurantists.” The very accusation is equivalent to saying that moral values don’t evolve!

It is OK to say: there was inequality of sexes; using “sexism” (with its negative connotation from today’s meaning) is not OK.

And indeed numerous authors are being censored, removed from the curriculum because of anachronistic bigoteering. In hindsight, everything in the past will be tainted.

I guess we’re just drawing the line at a different place. I think Rosza’s framing is fair: the Declaration is racist, sexist, and prejudiced, but it’s also an eloquent and morally moving document. But I do think it’s unfair for you to call Rosza’s article anachronistic, as to me it seems very careful and historically aware, the opposite of anachronistic. Also seems unfair to me to connect Rosza, who’s talking about flaws in a document that he things we should “take pride in,” with other people who are censoring things. Critical discussion is not in any way a form of censorship.

I sent the above to Nassim, who wrote:

I totally agree that we should be critical of the ancients—so long as we do not engage in hindsight games and values via modern accusatory language. I for myself have been waging a war against Plato and his legacy.

The danger of censorship is real (just witness the calls for the removal of statues, texts from the curricula, and the trending bowdlerization of the discourse). And the fact that you yourself wrote “the Declaration is racist, sexist, and prejudiced,” with the “isms” and the accusation of “prejudice” scare me quite a bit.

Accusing every single ancient of “racism” (which you practically can) trivializes the attitude modern racists and, by cheapening the currency, hurts their victims. Because someone racist in 2019 is racist.

A war against Plato, huh? You’re in good company. Karl Popper famously started wars with Plato, Marx, and Freud. None of these targets were around to fight back, but that’s ok. Typically in a dispute we have little hope of convincing the other person anyway, and all these people left enough written material to serve in their defense.

Regarding the Declaration being racist, sexist, and prejudiced: Yeah, not much of a surprise given that it was written by a dude who bought and sold slaves. But I’m with Rosza that the document should be understood in historical context. Not banned or censored or bowdlerized.

Alison Mattek on physics and psychology, philosophy, models, explanations, and formalization

Alison Mattek writes:

I saw your recent blog post on falsifiable claims. For the past couple of years I have been developing a theoretical framework that highlights the importance of unfalsifiable claims in science. I try to also make a few unfalsifiable claims regarding psychological variables.

Here is Mattek’s paper, “Expanding psychological theory using system analogies.” It reminds me a bit of the writings of Paul Meehl.

What’s published in the journal isn’t what the researchers actually did.

David Allison points us to these two letters:

Alternating Assignment was Incorrectly Labeled as Randomization, by Bridget Hannon, J. Michael Oakes, and David Allison, in the Journal of Alzheimer’s Disease.

Change in study randomization allocation needs to be included in statistical analysis: comment on ‘Randomized controlled trial of weight loss versus usual care on telomere length in women with breast cancer: the lifestyle, exercise, and nutrition (LEAN) study,’ by Stephanie Dickinson, Lilian Golzarri-Arroyo, Andrew Brown, Bryan McComb, Chanaka Kahathuduwa, and David Allison, in Breast Cancer Research and Treatment.

It can be surprisingly difficult for researchers to simply say exactly what they did. Part of this might be a desire to get credit for design features such as random assignment that were too difficult to actually implement; part of it could be sloppiness/laziness; but part of it could just be that, when you write, it’s so easy to drift into conventional patterns. Designs are supposed to be random assignment, so you label them as random assignment, even if they’re not. The above examples are nothing like pizzagate, but it’s part of the larger problem that the scientific literature can’t be trusted. It’s not just that you can’t trust the conclusions; it’s also that papers make claims that can’t possibly be supported by the data in them, and that papers don’t state what the researchers actually did.

As always, I’m not saying these researchers are bad people. Honesty and transparency are not enuf. If you’re a scientist, and you write up your study, and you don’t describe it accurately, we—the scientific community, the public, the consumers of your work—are screwed, even if you’re a wonderful, honorable person. You’ve introduced buggy software in the world, and the published corrections, if any, are likely to never catch up.

P.S. Hannon, Oakes, and Allison explain why it matters that the design described as a “randomized controlled trial” wasn’t actually that:

By sequentially enrolling participants using alternating assignment, the researchers and enrolling physicians in this study were able to know to which group the next participant would be assigned, and there is no allocation concealment. . . .

The allocation method employed by Ito et al. allows the research team to determine in which group a participant would be assigned, and thus could (unintentionally) manipulate the enrollment. . . .

Alternating assignment, or similarly using patient chart numbers, days of the week, date of birth, etc., are nonrandom methods of group allocation, and should not be used in place of randomly assigning participants . . .

There are a number of disciplines (i.e., public health, community interventions, etc.) which commonly employ nonrandomized intervention evaluation studies, and these can be conducted with rigor. It is crucial for researchers conducting these nonrandomized trials to report procedures accurately.

Gendered languages and women’s workforce participation rates

Rajesh Venkatachalapathy writes:

I recently came across a world bank document claiming that gendered languages reduce women’s labor force participation rates. It is summarized in the following press release: Gendered Languages May Play a Role in Limiting Women’s Opportunities, New Research Finds.

This sounds a lot like the piranha problem, if there is any effect at all.

I [Venkatachalapathy] am disturbed by claims of large effects in their study. Their work seems to rely conceptually on the Sapir-Whorf hypothesis in linguistics, which is also quiet controversial on its own. I am curious to know what your take is on this report.

He continues:

The cognitive science behind Sapir-Whorf, and the related field of embodied cognition in general is quiet controversial; it appeals to so many people, yet has very weak evidence (see for example, the recent book by McWhorter). This paper seems to magnify this to say something so strong about macroeconomic labor market demographic indicators. I cannot avoid comparisons with Pinker’s hypothesis in his most recent book that enlightenment thought and secular humanistic principles derived from it has been one of the primary drivers of the civilizing process of the Norbert Elias kind or the Pinker kind.

I am not claiming that such macro-level claims can never be justified. For example, I just began reading your academic colleague, economist Suresh Naidu’s recent paper on how democratization in countries causes economic growth. From the looks of it, they seem to have worked hard at establishing their main hypothesis. Maybe, their [Naidu or his collaborators] approach might provide us with additional insight on whether the causal claims of the paper on gendered language and workforce participation is reasonable and defensible with existing data, and with their [the paper’s] data analysis approach. I just find it difficult to imagine how a psychologically weak effect can suddenly become magnified when scaled to level of large scale societies.

After having trained hard to be skeptical of all causal claims over the years, I see what I feel is an epidemic of causal claims popping up in the literature and I find it hard to believe them all, especially given the fact that progress in philosophical causality and causal inference has been only incremental.

My response: I agree that such claims from observational data in cross-country and cross-cultural comparisons can be artifactual, and languages are correlated with all sorts of things. I don’t know enough about the topic to say more.

“The most mysterious star in the galaxy”

Charles Margossian writes:

The reading for tomorrow’s class reminded me of a project I worked on as an undergraduate. It was the planet hunter initiative. The project shows light-curves to participants and asks them to find transit signals (i.e. evidence of a transiting planets). The idea was to rely on human pattern recognition capabilities to find planets missed by NASA’s algorithms—and it worked! “The first publication I was involved in was on the discovery of such a planet.

But even better: users found a star with a very strange light-curve, which had been dismissed as a false-positive signal by the algorithm. Upon inspection it turned out that… we had no idea what was going on. A paper, falsifying a bunch of hypothesis, was published. It was cool to see a popular paper about us not knowing. The star was called Taby’s star (after the astronomer who investigated it), and deemed “the most mysterious star in the galaxy”.

So there—an example of using graphs to do research and, what’s more, make it accessible to the public.

What does it take to repeat them?

Olimpiu Urcan writes:

Making mistakes is human, but it takes a superhuman dose of ego and ignorance to repeat them after you’ve been publicly admonished about them.

Not superhuman at all, unfortunately. We see it all the time. All. The. Time.

I’m reminded of the very first time I contacted newspaper columnist David Brooks to point out one of his published errors. I honestly thought he’d issue a correction. But no, he just dodged it. Dude couldn’t handle the idea that he might have ever been wrong.

Similarly with those people who publish all those goofy research claims. Very rarely do they seem to be able to admit they made a mistake. I’m not talking about fraud or scientific misconduct here, just admitting an honest mistake of the sort that can happen to any of us. Nope. On the rare occasion when a scientist does admit a mistake, it’s cause for celebration.

So, no. Unfortunately I disagree with Urcan that repeating mistakes is anything superhuman. Repeating mistakes is standard operating practice, and it goes right along with never wanting to accept that an error was made in the first place.

This bit from Urcan I do agree with, though:

For plagiarists, scammers and utter incompetents to thrive, they seek enablers with the same desperation and urgency leeches seek hemoglobin banks.

Well put. And these enablers are all over the place. Some people even seem to make a career of it. I can see why they do it. If you help a scammer, he might help you in return. And you get to feel like a nice person, too. As long as you don’t think too hard about the people wasting their time reading the scammer’s products.

Blindfold play and sleepless nights

In Edward Winter’s Chess Explorations there is the following delightful quote from the memoirs of chess player William Winter:

Blindfold play I have never attempted seriously. I once played six, but spent so many sleepless nights trying to drive the positions out of my head that I gave it up.

I love that. We think of the difficulty as being in the remembering, but maybe it is the forgetting that is the challenge. I’m reminded of a lecture I saw by Richard Feynman at Bell Labs: He was talking about the theoretical challenges of quantum computing, and he identified the crucial entropy-producing step as that of zeroing the machine, i.e. forgetting.

Update on keeping Mechanical Turk responses trustworthy

This topic has come up before . . . Now there’s a new paper by Douglas Ahler, Carolyn Roush, and Gaurav Sood, who write:

Amazon’s Mechanical Turk has rejuvenated the social sciences, dramatically reducing the cost and inconvenience of collecting original data. Recently, however, researchers have raised concerns about the presence of “non-respondents” (bots) or non-serious respondents on the platform. Spurred by these concerns, we fielded an original survey on MTurk to measure response quality. While we find no evidence of a “bot epidemic,” we do find that a significant portion of survey respondents engaged in suspicious be- havior. About 20% of respondents either circumvented location requirements or took the survey multiple times. In addition, at least 5-7% of participants likely engaged in “trolling” or satisficing. Altogether, we find about a quarter of data collected on MTurk is potentially untrustworthy. Expectedly, we find response quality impacts experimental treatments. On average, low quality responses attenuate treatment effects by approximately 9%. We conclude by providing recommendations for collecting data on MTurk.

And here are the promised recommendations:

• Use geolocation filters on survey platforms like Qualtrics to enforce any geographic restrictions.

• Make use of tools on survey platforms to retrieve IP addresses. Run each IP through Know Your IP to identify blacklisted IPs and multiple responses originating from the same IP.

• Include questions to detecting trolling and satisficing but do not copy and paste from a standard canon as that makes “gaming the survey” easier.

• Increase the time between Human intelligence task (HIT) completion and auto-approval so that you can assess your data for untrustworthy responses before approving or rejecting the HIT.

• Rather than withhold payments, a better policy may be to incentivize workers by giving them a bonus when their responses pass quality filters.

• Be mindful of compensation rates. While unusually stingy wages will lead to slow data collection times and potentially less effort by Workers, unusually high wages may give rise to adverse selection—especially because HITs are shared on Turkopticon, etc. soon after posting. . . Social scientists who conduct research on MTurk should stay apprised of the current “fair wage” on MTurk and adhere accordingly.

• Use Worker qualifications on MTurk and filter to include only Workers who have a high percentage of approved HITs into your sample.

They also say they do not think that the problem is limited to MTurk.

I haven’t tried to evaluate all these claims myself, but I thought I’d share it all with those of you who are using this tool in your research. (Or maybe some of you are MTurk bots; who knows what will be the effect of posting this material here.)

P.S. Sood adds:

From my end, “random” error is mostly a non-issue in this context. People don’t use M-Turk to produce generalizable estimates—hardly anyone post-stratifies, for instance. Most people use it to say they did something. I suppose it is a good way to ‘fail fast.’ (The downside is that most failures probably don’t see the light of day.) And if we people wanted to buy stat. sig., bulking up on n is easily and cheaply done — it is the raison d’etre of MTurk in some ways.

So what is the point of the article? Twofold, perhaps. First is that it is good to parcel out measurement error where we can. And the second point is about how do we build a system where the long-term prognosis is not simply noise. And what struck out for me from the data was just the sheer scale of plausibly cheeky behavior. I did not anticipate that.

Endless citations to already-retracted articles

Ken Cor and Gaurav Sood write:

Many claims in a scientific article rest on research done by others. But when the claims are based on flawed research, scientific articles potentially spread misinformation. To shed light on how often scientists base their claims on problematic research, we exploit data on cases where problems with research are broadly publicized. Using data from over 3,000 retracted articles and over 74,000 citations to these articles, we find that at least 31.2% of the citations to retracted articles happen a year after they have been retracted. And that 91.4% of the post-retraction citations are approving—note no concern with the cited article. We augment the analysis with data from an article published in Nature Neuroscience highlighting a serious statistical error in articles published in prominent journals. Data suggest that problematic research was approvingly cited more frequently after the problem was publicized [emphasis added]. Our results have implications for the design of scholarship discovery systems and scientific practice more generally.

I think that by “31.2%” and “91.4%” they mean 30% and 90% . . . but, setting aside this brief lapse in taste or numeracy, their message is important.

P.S. In case you’re wondering why I’d round those numbers: I just don’t think those last digits are conveying any real information. To put it another way, in any sort of replication, I’d expect to see numbers that differ by at least a few percentage points. Reporting as 30% and 90% seems to me to capture what they found without adding meaningless precision.

Gigerenzer: “The Bias Bias in Behavioral Economics,” including discussion of political implications

Gerd Gigerenzer writes:

Behavioral economics began with the intention of eliminating the psychological blind spot in rational choice theory and ended up portraying psychology as the study of irrationality. In its portrayal, people have systematic cognitive biases that are not only as persistent as visual illusions but also costly in real life—meaning that governmental paternalism is called upon to steer people with the help of “nudges.” These biases have since attained the status of truisms. In contrast, I show that such a view of human nature is tainted by a “bias bias,” the tendency to spot biases even when there are none. This may occur by failing to notice when small sample statistics differ from large sample statistics, mistaking people’s random error for systematic error, or confusing intelligent inferences with logical errors. Unknown to most economists, much of psychological research reveals a different portrayal, where people appear to have largely fine-tuned intuitions about chance, frequency, and framing. A systematic review of the literature shows little evidence that the alleged biases are potentially costly in terms of less health, wealth, or happiness. Getting rid of the bias bias is a precondition for psychology to play a positive role in economics.

Like others, Gigerenzer draws the connection to visual illusions, but with a twist:

By way of suggestion, articles and books introduce biases together with images of visual illusions, implying that biases (often called “cognitive illusions”) are equally stable and inevitable. If our cognitive system makes such big blunders like our visual system, what can you expect from everyday and business decisions? Yet this analogy is misleading, and in two respects.

First, visual illusions are not a sign of irrationality, but a byproduct of an intelligent brain that makes “unconscious inferences”—a term coined by Hermann von Helmholtz—from two-dimensional retinal images to a three-dimensional world. . . .

Second, the analogy with visual illusions suggests that people cannot learn, specifically that education in statistical reasoning is of little efficacy (Bond, 2009). This is incorrect . . .

It’s an interesting paper. Gigerenzer goes through a series of classic examples of cognitive errors, including the use of base rates in conditional probability, perceptions of patterns in short sequences, the hot hand, bias in estimates of risks, systematic errors in almanac questions, the Lake Wobegon effect, and framing effects.

I’m a sucker for this sort of thing. It might be that at some points Gigerenzer is overstating his case, but he makes a lot of good points.

Some big themes

In his article, Gigerenzer raises three other issues that I’ve been thinking about a lot lately:

1. Overcertainty in the reception and presentation of scientific results.

2. Claims that people are stupid.

3. The political implications of claims that people are stupid.

Overcertainty and the problem of trust

Gigerenzer writes:

The irrationality argument exists in many versions (e.g. Conley, 2013; Kahneman, 2011). Not only has it come to define behavioral economics but it also has defined how most economists view psychology: Psychology is about biases, and psychology has nothing to say about reasonable behavior.

Few economists appear to be aware that the bias message is not representative of psychology or cognitive science in general. For instance, loss aversion is often presented as a truism; in contrast, a review of the literature concluded that the “evidence does not support that losses, on balance, tend to be any more impactful than gains” (Gal and Rucker, 2018). Research outside the heuristics-and-biases program that does not confirm this message—including most of the psychological research described in this article—is rarely cited in the behavioral economics literature (Gigerenzer, 2015).

(We discussed Gal and Rucker (2018) here.)

More generally, this makes me think of the problem of trust that Kaiser Fung and I noted in the Freakonomics franchise. There’s so much published research out there, indeed so much publicized research, that it’s hard to know where to start, so a natural strategy for sifting through and understanding it all is using networks of trust. You trust your friends and colleagues, they trust their friends and colleagues, and so on. But you can see you this can lead to economists getting a distorted view of the content of psychology and cognitive science.

Claims that people are stupid

The best of the heuristics and biases research is fascinating, important stuff that has changed my life and gives us, ultimately, a deeper respect for ourselves as reasoning beings. But, as Gigerenzer points out, this same research is often misinterpreted as suggesting that people are easily-manipulable (or easily-nudged) fools, and this fits in with lots of junk science claims of the same sort: pizzagate-style claims that the amount you eat can be manipulated by the size of your dining tray, goofy poli-sci claims that a woman’s vote depends on the time of the month, air rage, himmicanes, shark attacks, ages-ending-in-9, and all the rest. This is an attitude which I can understand might be popular among certain marketers, political consultants, and editors of the Proceedings of the National Academy of Sciences, but I don’t buy it, partly because of zillions of errors in the published studies in question and also because of the piranha principle. Again, what’s important here is not just the claim that people make mistakes, but that they can be consistently manipulated using what would seem to be irrelevant stimuli.

Political implications

As usual, let me emphasize that if these claims were true—if it were really possible to massively and predictably change people’s attitudes on immigration by flashing a subliminal smiley face on a computer screen—then we’d want to know it.

If the claims don’t pan out, then they’re not so interesting, except inasmuch as: (a) it’s interesting that smart people believed these things, and (b) we care if resources are thrown at these ideas. For (b), I’m not just talking about NSF funds etc., I’m also talking about policy money (remember, pizzagate dude got appointed to a U.S. government position at one point to implement his ideas) and just a general approach toward policymaking, things like nudging without persuasion, nudges that violate the Golden Rule, and of course nudges that don’t work.

There’s also a way in which a focus on individual irrationality can be used to discredit or shift blame onto the public. For example, Gigerenzer writes:

Nicotine addiction and obesity have been attributed to people’s myopia and probability-blindness, not to the actions of the food and tobacco industry. Similarly, an article by the Deutsche Bank Research “Homo economicus – or more like Homer Simpson?” attributed the financial crisis to a list of 17 cognitive biases rather than the reckless practices and excessive fragility of banks and the financial system (Schneider, 2010).

Indeed, social scientists used to talk about the purported irrationality of voting (for our counter-argument, see here). If voters are irrational, then we shouldn’t take their votes seriously.

I prefer Gigerenzer’s framing:

The alternative to paternalism is to invest in citizens so that they can reach their own goals rather than be herded like sheep.