### Did she really live 122 years?

Even more famous than “the Japanese dude who won the hot dog eating contest” is “the French lady who lived to be 122 years old.”

But did she really?

Paul Campos points us to this post, where he writes:

Here’s a statistical series, laying out various points along the 100 longest known durations of a particular event, of which there are billions of known examples. The series begins with the 100th longest known case:

100th: 114 years 93 days

90th: 114 years 125 days

80th: 114 years 182 days

70th: 114 years 208 days

60th: 114 years 246 days

50th: 114 years 290 days

40th: 115 years 19 days

30th: 115 years 158 days

20th: 115 years 319 days

10th: 116 years 347 days

9th: 117 years 27 days

8th: 117 years 81 days

7th: 117 years 137 days

6th: 117 years 181 days

5th: 117 years 230 days

4th: 117 years 248 days

3rd: 117 years 260 days

Based on this series, what would you expect the second-longest and the longest known durations of the event to be?

These are the maximum verified — or as we’ll see “verified” — life spans achieved by human beings, at least since it began to be possible to measure this with some loosely acceptable level of scientific accuracy . . .

Given the mortality rates observed between ages 114 and 117 in the series above, it would be somewhat surprising if anybody had actually reached the age of 118. Thus it’s very surprising to learn that #2 on the list, an American woman named Sarah Knauss, lived to be 119 years and 97 days. That seems like an extreme statistical outlier, and it makes me wonder if Knauss’s age at death was recorded correctly (I know nothing about how her age was verified).

But the facts regarding the #1 person on the list — a French woman named Jeanne Calment who was definitely born in February of 1875, and was determined to have died in August of 1997 by what was supposedly all sorts of unimpeachable documentary evidence, after reaching the astounding age of 122 years, 164 days — are more than surprising. . . .

A Russian mathematician named Nikolay Zak has just looked into the matter, and concluded that, despite the purportedly overwhelming evidence that made it certain beyond a reasonable doubt that Calment reached such a remarkable age, it’s actually quite likely, per his argument, that Jeanne Calment died in the 1930s, and the woman who for more than 20 years researchers all around the world considered to be the oldest person whose age had been “conclusively” documented was actually her daughter, Yvonne. . . .

I followed the link and read Zak’s article, and . . . I have no idea.

The big picture is that, after age 110, the probability of dying is about 50% per year. For reasons we’ve discussed earlier, I don’t think we should take this constant hazard rate too seriously. But if we go with that, and we start with 100 people reaching a recorded age of 114, we’d expect about 50 to reach 115, 25 to reach 116, 12 to reach 117, 6 to reach 118, 3 to reach 119, etc. . . . so 122 is not at all out of the question. So I don’t really buy Campos’s statistical argument, which all seems to turn on there being a lot of people who reached 117 but not 118, which in turn is just a series of random chances that can just happen.

Although I have nothing to add to the specific question of Jeanne or Yvonne Calment, I do have some general thoughts on this story:

– It’s stunning to me how these paradigm shifts come up, where something that everybody believes is true, is questioned. I’ve been vaguely following discussions about the maximum human lifespan (as in the link just above), and the example of Calment comes up all the time, and I’d never heard anyone suggest her story might be fake. According to Zak, there had been some questioning, but it it didn’t go far enough for me to have heard about it.

Every once in awhile we hear about these exciting re-thinkings of the world. Sometimes it seems that turn out to be right (for example, that story about the asteroid collision that indirectly killed the dinosaurs. Or, since we’re on the topic, the story that modern birds are dinosaurs’ descendants). Other times these new ideas seem to have been dead ends (for example, claim that certain discrepancies in sex ratios could be explained by hepatitis). As Joseph Delaney discusses in the context of the latter example, sometimes an explanation can be too convincing, in some way. The challenge is to value paradigm-busting ideas without falling in love with them.

– The Calment example is a great illustration of Bayesian inference. Bayesian reasoning should lead us to be skeptical of Calment’s claimed age. Indeed, as Zak notes, Bayesian reasoning should lead us to be skeptical of any claim on the tail of any distribution. Those 116-year-olds and 117-year-olds on Campos’s list above: we should be skeptical of each of them too. It’s just simple probabilistic reasoning: there’s some baseline probability that anyone’s claimed age will be fake, and if the distribution of fake ages has wider tails than the distribution of real ages, then an extreme claimed age is some evidence of an error. The flip side is that there must be some extreme ages out there that we haven’t heard about.

– The above discussion also leads to a sort of moral hazard of Bayesian inference: If we question the extreme reported ages without correspondingly researching other ages, we’ll be shrinking our distribution. As Phil and I discuss in our paper, All maps of parameters are misleading, there’s no easy solution to this problem, but we at least should recognize it.

I hadn’t considered that the clustering at 117 is probably just random, but of course that makes sense. Calment does seem like a massive outlier, and as you say from a Bayesian perspective the fact that she’s such an outlier makes the potential holes in the validation of her age more probable than otherwise. What I don’t understand about the inheritance fraud theory is that Jeanne’s husband lived until 1942, eight years after Jeanne’s hypothesized death. It would be unusual, I think, for French inheritance law not to give a complete exemption to a surviving spouse for any inheritance tax liability (that’s the case in the legal systems I know something about), but I don’t know anything about French inheritance law.

The post Did she really live 122 years? appeared first on Statistical Modeling, Causal Inference, and Social Science.

### Objective Bayes conference in June

Christian Robert points us to this Objective Bayes Methodology Conference in Warwick, England in June. I’m not a big fan of the term “objective Bayes” (see my paper with Christian Hennig, Beyond subjective and objective in statistics), but the conference itself looks interesting, and there are still a few weeks left for people to submit posters.

The post Objective Bayes conference in June appeared first on Statistical Modeling, Causal Inference, and Social Science.

Jonathan Falk writes:

A quick search seems to imply that you haven’t discussed the Fermi equation for a while.

This looks to me to be in the realm of Miller and Sanjurjo: a simple probabilistic explanation sitting right under everyone’s nose. Comment?

“This” is a article, Dissolving the Fermi Paradox, by Anders Sandberg, Eric Drexler and Toby Ord, which begins:

The Fermi paradox is the conflict between an expectation of a high ex ante probability of intelligent life elsewhere in the universe and the apparently lifeless universe we in fact observe. The expectation that the universe should be teeming with intelligent life is linked to models like the Drake equation, which suggest that even if the probability of intelligent life developing at a given site is small, the sheer multitude of possible sites should nonetheless yield a large number of potentially observable civilizations. We show that this conflict arises from the use of Drake-like equations, which implicitly assume certainty regarding highly uncertain parameters. . . . When the model is recast to represent realistic distributions of uncertainty, we find a substantial ex ante probability of there being no other intelligent life in our observable universe . . . This result dissolves the Fermi paradox, and in doing so removes any need to invoke speculative mechanisms by which civilizations would inevitably fail to have observable effects upon the universe.

I solicited thoughts from astronomer David Hogg, who wrote:

I have only skimmed it, but it seems reasonable. Life certainly could be rare, and technological life could be exceedingly rare. Some of the terms do have many-order-of-magnitude uncertainties.

That said, we now know that a large fraction of stars host planets and many host planets similar to the Earth, so the uncertainties on planet-occurrence terms in any Drake-like equation are now much lower than order-of-magnitude.

And Hogg forwarded the question to another astronomer, Jason Wright, who wrote:

The original questioner’s question (Thomas Basbøll’s submission from December) is addressed explicitly here.

In short, only the duration of transmission matters in steady-state, which is the final L term in Drake’s famous equation. Start time does not matter.

Regarding Andrew’s predicate “given that we haven’t hard any such signals so far” in the OP: despite the high profile of SETI, almost no actual searching has occurred because the field is essentially unfunded (until Yuri Milner’s recent support). Jill Tarter analogizes the idea that we need to update our priors based on the searching to date as being equivalent to saying that there must not be very many fish in the ocean based on inspecting the contents of a single drinking glass dipped in it (that’s a rough OOM, but it’s pretty close). And that’s just searches for narrowband radio searches; other kinds of searches are far, far less complete.

And Andrew is not wrong that the amount of popular discussion of SETI has gone way down since the ’90’s. A good account of the rise and fall of government funding for SETI is Garber (1999).

I have what I think is a complete list of NASA and NSF funding since the (final) cancellation of NASA’s SETI work in 1993, and it sums to just over \$2.5M (not per year—total). True, Barnie Oliver and Paul Allen contributed many millions more, but most of this went to develop hardware and pay engineers to build the (still incomplete and barely operating) Allen Telescope Array; it did not train students or fund much in the way of actual searches.

So you haven’t heard much about SETI because there’s not much to say. Instead, most of the literature is people in their space time endlessly rearranging, recalculating, reinventing, modifying, and critiquing the Drake Equation, or offering yet another “solution” to the Fermi Paradox in the absence of data.

The central problem is that for all of the astrobiological terms in the Drake Equation we have a sample size on 1 (Earth), and since that one is us we run into “anthropic principle” issues whenever we try to use it to estimate those terms.

The recent paper by Sandberg calculates reasonable posterior distributions on N in the Drake Equation, and indeed shows that they are so wide that N=0 is not excluded, but the latter point has been well appreciated since the equation was written down, so this “dissolution” to the Fermi Paradox (“maybe spacefaring life is just really rare”) is hardly novel. It was the thesis of the influential book Rare Earth and the argument used by Congress as a justification for blocking essentially all funding to the field for the past 25 years.

Actually, I would say that an equally valid takeaway from the Sandberg paper is that very large values of N are possible, so we should definitely be looking for them!

So make of that what you will.

P.S. I posted this in July 2018. The search for extraterrestrial intelligence is one topic where I don’t think much is lost in our 6-month blog delay.

The post “Dissolving the Fermi Paradox” appeared first on Statistical Modeling, Causal Inference, and Social Science.

### Back by popular demand . . . The Greatest Seminar Speaker contest!

Regular blog readers will remember our seminar speaker competition from a few years ago.

Here was our bracket, back in 2015:

And here were the 64 contestants:

– Philosophers:
Plato (seeded 1 in group)
Alan Turing (seeded 2)
Aristotle (3)
Friedrich Nietzsche (4)
Thomas Hobbes
Jean-Jacques Rousseau
Bertrand Russell
Karl Popper

Mohandas Gandhi (1)
Martin Luther King (2)
Henry David Thoreau (3)
Mother Teresa (4)
Al Sharpton
Phyllis Schlafly
Yoko Ono
Bono

– Authors:
William Shakespeare (1)
Miguel de Cervantes (2)
James Joyce (3)
Mark Twain (4)
Jane Austen
John Updike
Raymond Carver
Leo Tolstoy

– Artists:
Leonardo da Vinci (1)
Rembrandt van Rijn (2)
Vincent van Gogh (3)
Marcel Duchamp (4)
Grandma Moses
Barbara Kruger
The guy who did Piss Christ

– Founders of Religions:
Jesus (1)
Buddha (3)
Abraham (4)
L. Ron Hubbard
Mary Baker Eddy
Sigmund Freud
Karl Marx

– Cult Figures:
John Waters (1)
Philip K. Dick (2)
Ed Wood (3)
Judy Garland (4)
Sun Myung Moon
Charles Manson
Joan Crawford
Stanley Kubrick

– Comedians:
Richard Pryor (1)
George Carlin (2)
Chris Rock (3)
Larry David (4)
Alan Bennett
Stewart Lee
Ed McMahon
Henny Youngman

– Modern French Intellectuals:
Albert Camus (1)
Simone de Beauvoir (2)
Bernard-Henry Levy (3)
Claude Levi-Strauss (4)
Raymond Aron
Jacques Derrida
Jean Baudrillard
Bruno Latour

We did single elimination, one match per day, alternating with the regular blog posts. See here and here for the first two contests, here for an intermediate round, and here for the conclusion.

2019 edition

Who would be the ultimate seminar speaker? I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

Our new list includes eight current or historical figures from each of the following eight categories:
– Wits
– Creative eaters
– Magicians
– Mathematicians
– TV hosts
– People from New Jersey
– GOATs
– People whose names end in f

All these categories seem to be possible choices to reach the sort of general-interest intellectual community that was implied by the [notoriously hyped] announcement of Slavoj Zizek Bruno Latour’s visit to Columbia a few years ago.

The rules

I’ll post one matchup each day at noon, starting sometime next week or so, once we have the brackets prepared.

Once each pairing is up, all of you can feel free (indeed, are encouraged) to comment. I’ll announce the results when posting the next day’s matchup.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

As with our previous contest four years ago, we’re continuing the regular flow of statistical modeling, causal inference, and social science posts. They’ll alternate with these matchup postings.

### Robin Pemantle’s updated bag of tricks for math teaching!

Here it is! He’s got the following two documents:

– Tips for Active Learning in the College Setting

– Tips for Active Learning in Teacher Prep or in the K-12 Setting

This is great stuff (see my earlier review here).

Every mathematician and math teacher in the universe should read this. So, if any of you happen to be well connected to the math world, please pass this along.

### What to do when you read a paper and it’s full of errors and the author won’t share the data or be open about the analysis?

Someone writes:

I would like to ask you for an advice regarding obtaining data for reanalysis purposes from an author who has multiple papers with statistical errors and doesn’t want to share the data.

Recently, I reviewed a paper that included numbers that had some of the reported statistics that were mathematically impossible. As the first author of that paper wrote another paper in the past with one of my collaborators, I have checked their paper and also found multiple errors (GRIM, DF, inappropriate statistical tests, etc.). I have enquired my collaborator about it and she followed up with the first author who has done the analysis and said that he agreed to write an erratum.

Independently, I have checked further 3 papers from that author and all of them had a number of errors, which sheer number is comparable to what was found in Wansink’s case. At that stage I have contacted the first author of these papers asking him about the data for reanalysis purposes. As the email was unanswered, after 2 weeks I have followed up mentioning this time that I have found a number of errors in these papers and included his lab’s contact email address. This time I received a response swiftly and was told that these papers were peer-reviewed so if there were any errors they would have been caught (sic!), that for privacy reasons the data cannot be shared with me and I was asked to send a list of errors that I found. In my response I sent the list of errors and emphasized the importance of independent reanalysis and pointed out that the data comes from lab experiments and any personally identifiable information can be removed as it is not needed for reanalysis. After 3 weeks of waiting, and another email sent in the meantime, the author wrote that he is busy, but had time to check the analysis of one of the papers. In his response, he said that some of the mathematically impossible DFs were wrongly copied numbers, while the inconsistent statistics were due to wrong cells in the excel file selected that supposedly don’t change much. Moreover, he blamed the reviewers for not catching these mistypes (sic!) and said that he found the errors only after I contacted him. The problem is that it is the same paper for which my collaborator said that they checked the results already, so he must have been aware of these problems even before my initial email (I didn’t mention that I know that collaborator).

So here is my dilemma how to proceed. Considering that there are multiple errors, of multiple types across multiple papers it is really hard to trust anything else reported in them. The author clearly does not intend to share the data with me so I cannot verify if the data exists at all. If it doesn’t, as I have sent him the list of errors, he could reverse engineer what tools I have used and come up with numbers that will pass the tests that can be done based solely on the reported statistics.

As you may have more experience dealing with such situations, I thought that I may ask you for an advice how to proceed. Would you suggest contacting the involved publishers, going public or something else?

I hate to say it, but your best option here might be to give up. The kind of people who lie and cheat about their published work may also play dirty in other ways. So is it really worth it to tangle with these people? I have no idea about your particular case and am just speaking on general principles here.

You could try contacting the journal editor. Some journal editors really don’t like to find out that they’ve published erroneous work; others would prefer to sweep any such problems under the rug, either because they have personal connections to the offenders or just because they don’t want to deal with cheaters, as this is unpleasant.

Remember: journal editing is a volunteer job, and people sign up for it because they want to publish exciting new work, or maybe because they enjoy the power trip, or maybe out of a sense of duty—but, in any case, they typically aren’t in it for the controversy. So, if you do get a journal editor who can help on this, great, but don’t be surprised if the editors slink away from the problem, for example by putting the burden in your lap by saying that your only option is to submit your critique in the form of an article for the journal, which can then be sent to the author of the original paper for review, and then rejected on the grounds that it’s not important enough to publish.

Maybe you could get Retraction Watch to write something on this dude?

Also is the paper listed on PubPeer? If so, you could comment there.

### “Principles of posterior visualization”

What better way to start the new year than with a discussion of statistical graphics.

Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles:

Principle 1: Uncertainty should be visualized

Principle 2: Visualization of variability ≠ Visualization of uncertainty

Principle 3: Equal probability = Equal ink

Principle 4: Do not overemphasize the point estimate

Principle 5: Certain estimates should be emphasized over uncertain

And this caution:

These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

And this is not just empty talk. Shubin demonstrates all these points with clear graphs.

Interesting how this complements our methods for visualization in Bayesian workflow.

The post “Principles of posterior visualization” appeared first on Statistical Modeling, Causal Inference, and Social Science.

### Authority figures in psychology spread more happy talk, still don’t get the point that much of the published, celebrated, and publicized work in their field is no good (Part 2)

Part 1 was here.

And here’s Part 2. Jordan Anaya reports:

I [Anaya] was annoyed to see that it mentions “a handful” of unreliable findings, and points the finger at fraud as the cause. But then I was shocked to see the 85% number for the Many Labs project.

I’m not that familiar with the project, and I know there is debate on how to calculate a successful replication, but they got that number from none other than the “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” people, as Sanjay Srivastava discusses here.

Schimmack identifies the above screenshot as being from Myers and Twenge (2018); I assume it’s this book, which has the following blurb:

Connecting Social Psychology to the world around us. Social Psychology introduces students to the science of us: our thoughts, feelings, and behaviors in a changing world. Students learn to think critically about everyday behaviors and gain an appreciation for the world around us, regardless of background or major.

But according to Schimmack, there’s “no mention of a replication failure in the entire textbook.” That’s fine—it’s not necessarily the job of an intro textbook to talk about ideas that didn’t work out—but then why mention replications in the first place? And why try to minimize it by talking about “a handful of unreliable findings”? A handful, huh? Who talks like that. This is a “Politics and the English Language” situation, where sloppy language serves sloppy thinking and bad practice.

Also, to connect replication failures to “fraud” is just horrible, as it’s consistent with two wrong messages: (a) that to point out a failed replication is to accuse someone of fraud, and (b) that, conversely, honest researchers can’t have replication failures. As I’ve written a few zillion times, honesty and transparency are not enuf. As I wrote here, it’s a mistake to focus on “p-hacking” and bad behavior rather than the larger problem of researchers expecting routine discovery.

So, the blurb for the textbook says that students learn to think critically about everyday behaviors—but they won’t learn to think critically about published research in the field of psychology.

Just to be clear: I’m not saying the authors of this textbook are bad people. My guess is they just want to believe the best about their field of research, and enough confused people have squirted enough ink into the water to confuse them into thinking that the number of unreliable findings really might be just “a handful,” that 85% of experiments in that study replicated, that the replication rate in psychology is statistically indistinguishable from 100%, that elections are determined by shark attacks and college football games, that single women were 20 percentage points more likely to support Barack Obama during certain times of the month, that elderly-priming words make you walk slower, that Cornell students have ESP, etc etc etc. There are lots of confused people out there, not sure where to turn, so it makes sense that some textbook writers will go for the most comforting possible story. I get it. They’re not trying to mislead the next generation of students; they’re just doing their best.

There are no bad guys here.

Let’s just hope 2019 goes a little better.

A good start would be for the authors of this book to send a public note to Uli Schimmack thanking them for pointing out their error, and then replacing that paragraph with something more accurate in their next printing. They could also write a short article for Perspectives on Psychological Science on how they got confused on this point, as this could be instructive for other teachers of psychology. They don’t have to do this. They can do whatever they want. But this is my suggestion how they could get 2019 off to a good start, in one small way.

I want to write a more formal article about this, but in the meantime here’s a placeholder.

The topic is the combination of apparently contradictory evidence.

Let’s start with a simple example: you have some ratings on a 1-10 scale. These could be, for example, research proposals being rated by a funding committee, or, umm, I dunno, gymnasts being rated by Olympic judges. Suppose there are 3 judges doing the ratings, and consider two gymnasts: one receives ratings of 8, 8, 8; the other is rated 6, 8, 10. Or, forget about ratings, just consider students taking multiple exams in a class. Consider two students: Amy, whose three test scores are 80, 80, 80; and Beth, who had scores 80, 100, 60. (I’ve purposely scrambled the order of those last three so that we don’t have to think about trends. Forget about time trends; that’s not my point here.)

How to compare those two students? A naive reader of test scores will say that Amy is consistent while Beth is flaky; or you might even say that you think Beth is better as she has a higher potential. But if you have some experience with psychometrics, you’ll be wary of overinterpreting results from three exam scores. Inference about an average from N=3 is tough; inference about variance from N=3 is close to impossible. Long story short: from a psychometrics perspective, there’s very little you can say about the relative consistency of Amy and Beth’s test-taking based on just three scores.

Academic researchers will recognize this problem when considering reviews of their own papers that they’ve submitted to journals. When you send in a paper, you’ll typically get a few reviews, and these reviews can differ dramatically in their messages.

Here’s a hilarious example supplied to me by Wolfgang Gaissmaier and Julian Marewski, from reviews of their 2011 article, “Forecasting elections with mere recognition from small, lousy samples: A comparison of collective recognition, wisdom of crowds, and representative polls.”

Here are some positive reviewer comments:

– This is a very interesting piece of work that raises a number of important questions related to public opinion. The major finding — that for elections with large numbers of parties, small non-probability samples looking only at party name recognition do as well as medium-sized probility samples looking at voter intent — is stunning.

– There is a lot to like about this short paper… I’m surprised by the strength of the results… If these results are correct (and I have no real reason to suspect otherwise), then the authors are more than justified in their praise of recognition-based forecasts. This could be an extremely useful forecasting technique not just for the multi-party European elections discussed by the authors, but also in relatively low-salience American local elections.

– This is concise, high-quality paper that demonstrates that the predictive power of (collective) recognition extends to the important domain of political elections.

And now the fun stuff. The negative comments:

– This is probably the strangest manuscript that I have ever been asked to review… Even if the argument is correct, I’m not sure that it tells us anything useful. The fact that recognition can be used to predict the winners of tennis tournaments and soccer matches is unsurprising – people are more likely to recognize the better players/teams, and the better players/teams usually win. It’s like saying that a football team wins 90% (or whatever) of the games in which it leads going into the fourth quarter. So what?

– To be frank, this is an exercise in nonsense. Twofold nonsense. For one thing, to forecast election outcomes based on whether or not voters recognize the parties/candidates makes no sense… Two, why should we pay any attention to unrepresentative samples, which is what the authors use in this analysis? They call them, even in the title, “lousy.” Self-deprecating humor? Or are the authors laughing at a gullible audience?

So, their paper is either “a very interesting piece of work” whose main finding is “stunning”—or it is “an exercise in nonsense” aimed at “a gullible audience.”

The post Combining apparently contradictory evidence appeared first on Statistical Modeling, Causal Inference, and Social Science.

### “Check yourself before you wreck yourself: Assessing discrete choice models through predictive simulations”

Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins:

Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated model represents reality. Such methods usually stop at checking parameter signs, model elasticities, and ratios of model coefficients. In this paper, I [Brathwaite] greatly expand the discrete choice modelers’ assessment toolkit by introducing model checking procedures based on graphical displays of predictive simulations. . . . a general and ‘semi-automatic’ algorithm for checking discrete choice models via predictive simulations. . . .

He frames model checking in terms of “underfitting,” a connection I’ve never seen before but which makes sense. To the extent that there are features in your data that are not captured in your model—more precisely, features that don’t show up, even in many different posterior predictive simulations from your fitted model—then, yes, the model is underfitting the data. Good point.