Philip Roth (4) vs. DJ Jazzy Jeff; Jim Thorpe advances

For yesterday’s battle (Jim Thorpe vs. John Oliver), I’ll have to go with Thorpe. We got a couple arguments in Oliver’s favor—we’d get to hear him say “Whot?”, and he’s English—but for Thorpe we heard a lot more, including his uniqueness as greatest athlete of all time, and that we could save money on the helmet if that were required. We also got the following bad reason: “the chance to hear him say, ‘I’ve been asked to advise those of you who are following this talk on social media, whatever that means, to use “octothorpe talktothorpe.”‘” Even that bad reason ain’t so bad, also it’s got 3 levels of quotation nesting, which counts for something right there. What iced it for Thorpe was this comment from Tom: “Seeing as he could do everything better than everyone else, just by giving it a go, he would surely give an incredible seminar.”

And for our next contest, it’s the Bard of Newark vs. a man who’s only in this contest because it was hard for me to think of 8 people whose name ended in f, whose entire fame comes from the decades-old phrase, “Fresh Prince and DJ Jazzy Jeff.” So whaddya want: riffs on Anne Frank and suburban rabbis, or some classic 80s beats? I dunno. I think Roth would be much more entertaining when question time comes along, but he can’t scratch.

Does anyone know these people? Do they exist or are they spooks?

The full bracket is here, and here are the rules:

We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

The post Philip Roth (4) vs. DJ Jazzy Jeff; Jim Thorpe advances appeared first on Statistical Modeling, Causal Inference, and Social Science.

Did she really live 122 years?

Even more famous than “the Japanese dude who won the hot dog eating contest” is “the French lady who lived to be 122 years old.”

But did she really?

Paul Campos points us to this post, where he writes:

Here’s a statistical series, laying out various points along the 100 longest known durations of a particular event, of which there are billions of known examples. The series begins with the 100th longest known case:

100th: 114 years 93 days

90th: 114 years 125 days

80th: 114 years 182 days

70th: 114 years 208 days

60th: 114 years 246 days

50th: 114 years 290 days

40th: 115 years 19 days

30th: 115 years 158 days

20th: 115 years 319 days

10th: 116 years 347 days

9th: 117 years 27 days

8th: 117 years 81 days

7th: 117 years 137 days

6th: 117 years 181 days

5th: 117 years 230 days

4th: 117 years 248 days

3rd: 117 years 260 days

Based on this series, what would you expect the second-longest and the longest known durations of the event to be?

These are the maximum verified — or as we’ll see “verified” — life spans achieved by human beings, at least since it began to be possible to measure this with some loosely acceptable level of scientific accuracy . . .

Given the mortality rates observed between ages 114 and 117 in the series above, it would be somewhat surprising if anybody had actually reached the age of 118. Thus it’s very surprising to learn that #2 on the list, an American woman named Sarah Knauss, lived to be 119 years and 97 days. That seems like an extreme statistical outlier, and it makes me wonder if Knauss’s age at death was recorded correctly (I know nothing about how her age was verified).

But the facts regarding the #1 person on the list — a French woman named Jeanne Calment who was definitely born in February of 1875, and was determined to have died in August of 1997 by what was supposedly all sorts of unimpeachable documentary evidence, after reaching the astounding age of 122 years, 164 days — are more than surprising. . . .

A Russian mathematician named Nikolay Zak has just looked into the matter, and concluded that, despite the purportedly overwhelming evidence that made it certain beyond a reasonable doubt that Calment reached such a remarkable age, it’s actually quite likely, per his argument, that Jeanne Calment died in the 1930s, and the woman who for more than 20 years researchers all around the world considered to be the oldest person whose age had been “conclusively” documented was actually her daughter, Yvonne. . . .

I followed the link and read Zak’s article, and . . . I have no idea.

The big picture is that, after age 110, the probability of dying is about 50% per year. For reasons we’ve discussed earlier, I don’t think we should take this constant hazard rate too seriously. But if we go with that, and we start with 100 people reaching a recorded age of 114, we’d expect about 50 to reach 115, 25 to reach 116, 12 to reach 117, 6 to reach 118, 3 to reach 119, etc. . . . so 122 is not at all out of the question. So I don’t really buy Campos’s statistical argument, which all seems to turn on there being a lot of people who reached 117 but not 118, which in turn is just a series of random chances that can just happen.

Although I have nothing to add to the specific question of Jeanne or Yvonne Calment, I do have some general thoughts on this story:

– It’s stunning to me how these paradigm shifts come up, where something that everybody believes is true, is questioned. I’ve been vaguely following discussions about the maximum human lifespan (as in the link just above), and the example of Calment comes up all the time, and I’d never heard anyone suggest her story might be fake. According to Zak, there had been some questioning, but it it didn’t go far enough for me to have heard about it.

Every once in awhile we hear about these exciting re-thinkings of the world. Sometimes it seems that turn out to be right (for example, that story about the asteroid collision that indirectly killed the dinosaurs. Or, since we’re on the topic, the story that modern birds are dinosaurs’ descendants). Other times these new ideas seem to have been dead ends (for example, claim that certain discrepancies in sex ratios could be explained by hepatitis). As Joseph Delaney discusses in the context of the latter example, sometimes an explanation can be too convincing, in some way. The challenge is to value paradigm-busting ideas without falling in love with them.

– The Calment example is a great illustration of Bayesian inference. Bayesian reasoning should lead us to be skeptical of Calment’s claimed age. Indeed, as Zak notes, Bayesian reasoning should lead us to be skeptical of any claim on the tail of any distribution. Those 116-year-olds and 117-year-olds on Campos’s list above: we should be skeptical of each of them too. It’s just simple probabilistic reasoning: there’s some baseline probability that anyone’s claimed age will be fake, and if the distribution of fake ages has wider tails than the distribution of real ages, then an extreme claimed age is some evidence of an error. The flip side is that there must be some extreme ages out there that we haven’t heard about.

– The above discussion also leads to a sort of moral hazard of Bayesian inference: If we question the extreme reported ages without correspondingly researching other ages, we’ll be shrinking our distribution. As Phil and I discuss in our paper, All maps of parameters are misleading, there’s no easy solution to this problem, but we at least should recognize it.

P.S. Campos adds:

I hadn’t considered that the clustering at 117 is probably just random, but of course that makes sense. Calment does seem like a massive outlier, and as you say from a Bayesian perspective the fact that she’s such an outlier makes the potential holes in the validation of her age more probable than otherwise. What I don’t understand about the inheritance fraud theory is that Jeanne’s husband lived until 1942, eight years after Jeanne’s hypothesized death. It would be unusual, I think, for French inheritance law not to give a complete exemption to a surviving spouse for any inheritance tax liability (that’s the case in the legal systems I know something about), but I don’t know anything about French inheritance law.

The post Did she really live 122 years? appeared first on Statistical Modeling, Causal Inference, and Social Science.

Objective Bayes conference in June

Christian Robert points us to this Objective Bayes Methodology Conference in Warwick, England in June. I’m not a big fan of the term “objective Bayes” (see my paper with Christian Hennig, Beyond subjective and objective in statistics), but the conference itself looks interesting, and there are still a few weeks left for people to submit posters.

The post Objective Bayes conference in June appeared first on Statistical Modeling, Causal Inference, and Social Science.

“Dissolving the Fermi Paradox”

Jonathan Falk writes:

A quick search seems to imply that you haven’t discussed the Fermi equation for a while.

This looks to me to be in the realm of Miller and Sanjurjo: a simple probabilistic explanation sitting right under everyone’s nose. Comment?

“This” is a article, Dissolving the Fermi Paradox, by Anders Sandberg, Eric Drexler and Toby Ord, which begins:

The Fermi paradox is the conflict between an expectation of a high ex ante probability of intelligent life elsewhere in the universe and the apparently lifeless universe we in fact observe. The expectation that the universe should be teeming with intelligent life is linked to models like the Drake equation, which suggest that even if the probability of intelligent life developing at a given site is small, the sheer multitude of possible sites should nonetheless yield a large number of potentially observable civilizations. We show that this conflict arises from the use of Drake-like equations, which implicitly assume certainty regarding highly uncertain parameters. . . . When the model is recast to represent realistic distributions of uncertainty, we find a substantial ex ante probability of there being no other intelligent life in our observable universe . . . This result dissolves the Fermi paradox, and in doing so removes any need to invoke speculative mechanisms by which civilizations would inevitably fail to have observable effects upon the universe.

I solicited thoughts from astronomer David Hogg, who wrote:

I have only skimmed it, but it seems reasonable. Life certainly could be rare, and technological life could be exceedingly rare. Some of the terms do have many-order-of-magnitude uncertainties.

That said, we now know that a large fraction of stars host planets and many host planets similar to the Earth, so the uncertainties on planet-occurrence terms in any Drake-like equation are now much lower than order-of-magnitude.

And Hogg forwarded the question to another astronomer, Jason Wright, who wrote:

The original questioner’s question (Thomas Basbøll’s submission from December) is addressed explicitly here.

In short, only the duration of transmission matters in steady-state, which is the final L term in Drake’s famous equation. Start time does not matter.

Regarding Andrew’s predicate “given that we haven’t hard any such signals so far” in the OP: despite the high profile of SETI, almost no actual searching has occurred because the field is essentially unfunded (until Yuri Milner’s recent support). Jill Tarter analogizes the idea that we need to update our priors based on the searching to date as being equivalent to saying that there must not be very many fish in the ocean based on inspecting the contents of a single drinking glass dipped in it (that’s a rough OOM, but it’s pretty close). And that’s just searches for narrowband radio searches; other kinds of searches are far, far less complete.

And Andrew is not wrong that the amount of popular discussion of SETI has gone way down since the ’90’s. A good account of the rise and fall of government funding for SETI is Garber (1999).

I have what I think is a complete list of NASA and NSF funding since the (final) cancellation of NASA’s SETI work in 1993, and it sums to just over $2.5M (not per year—total). True, Barnie Oliver and Paul Allen contributed many millions more, but most of this went to develop hardware and pay engineers to build the (still incomplete and barely operating) Allen Telescope Array; it did not train students or fund much in the way of actual searches.

So you haven’t heard much about SETI because there’s not much to say. Instead, most of the literature is people in their space time endlessly rearranging, recalculating, reinventing, modifying, and critiquing the Drake Equation, or offering yet another “solution” to the Fermi Paradox in the absence of data.

The central problem is that for all of the astrobiological terms in the Drake Equation we have a sample size on 1 (Earth), and since that one is us we run into “anthropic principle” issues whenever we try to use it to estimate those terms.

The recent paper by Sandberg calculates reasonable posterior distributions on N in the Drake Equation, and indeed shows that they are so wide that N=0 is not excluded, but the latter point has been well appreciated since the equation was written down, so this “dissolution” to the Fermi Paradox (“maybe spacefaring life is just really rare”) is hardly novel. It was the thesis of the influential book Rare Earth and the argument used by Congress as a justification for blocking essentially all funding to the field for the past 25 years.

Actually, I would say that an equally valid takeaway from the Sandberg paper is that very large values of N are possible, so we should definitely be looking for them!

So make of that what you will.

P.S. I posted this in July 2018. The search for extraterrestrial intelligence is one topic where I don’t think much is lost in our 6-month blog delay.

The post “Dissolving the Fermi Paradox” appeared first on Statistical Modeling, Causal Inference, and Social Science.

Back by popular demand . . . The Greatest Seminar Speaker contest!

Regular blog readers will remember our seminar speaker competition from a few years ago.

Here was our bracket, back in 2015:


And here were the 64 contestants:

– Philosophers:
Plato (seeded 1 in group)
Alan Turing (seeded 2)
Aristotle (3)
Friedrich Nietzsche (4)
Thomas Hobbes
Jean-Jacques Rousseau
Bertrand Russell
Karl Popper

– Religious Leaders:
Mohandas Gandhi (1)
Martin Luther King (2)
Henry David Thoreau (3)
Mother Teresa (4)
Al Sharpton
Phyllis Schlafly
Yoko Ono

– Authors:
William Shakespeare (1)
Miguel de Cervantes (2)
James Joyce (3)
Mark Twain (4)
Jane Austen
John Updike
Raymond Carver
Leo Tolstoy

– Artists:
Leonardo da Vinci (1)
Rembrandt van Rijn (2)
Vincent van Gogh (3)
Marcel Duchamp (4)
Thomas Kinkade
Grandma Moses
Barbara Kruger
The guy who did Piss Christ

– Founders of Religions:
Jesus (1)
Mohammad (2)
Buddha (3)
Abraham (4)
L. Ron Hubbard
Mary Baker Eddy
Sigmund Freud
Karl Marx

– Cult Figures:
John Waters (1)
Philip K. Dick (2)
Ed Wood (3)
Judy Garland (4)
Sun Myung Moon
Charles Manson
Joan Crawford
Stanley Kubrick

– Comedians:
Richard Pryor (1)
George Carlin (2)
Chris Rock (3)
Larry David (4)
Alan Bennett
Stewart Lee
Ed McMahon
Henny Youngman

– Modern French Intellectuals:
Albert Camus (1)
Simone de Beauvoir (2)
Bernard-Henry Levy (3)
Claude Levi-Strauss (4)
Raymond Aron
Jacques Derrida
Jean Baudrillard
Bruno Latour

We did single elimination, one match per day, alternating with the regular blog posts. See here and here for the first two contests, here for an intermediate round, and here for the conclusion.

2019 edition

Who would be the ultimate seminar speaker? I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

Our new list includes eight current or historical figures from each of the following eight categories:
– Wits
– Creative eaters
– Magicians
– Mathematicians
– TV hosts
– People from New Jersey
– People whose names end in f

All these categories seem to be possible choices to reach the sort of general-interest intellectual community that was implied by the [notoriously hyped] announcement of Slavoj Zizek Bruno Latour’s visit to Columbia a few years ago.

The rules

I’ll post one matchup each day at noon, starting sometime next week or so, once we have the brackets prepared.

Once each pairing is up, all of you can feel free (indeed, are encouraged) to comment. I’ll announce the results when posting the next day’s matchup.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

As with our previous contest four years ago, we’re continuing the regular flow of statistical modeling, causal inference, and social science posts. They’ll alternate with these matchup postings.

The post Back by popular demand . . . The Greatest Seminar Speaker contest! appeared first on Statistical Modeling, Causal Inference, and Social Science.

Robin Pemantle’s updated bag of tricks for math teaching!

Here it is! He’s got the following two documents:

– Tips for Active Learning in the College Setting

– Tips for Active Learning in Teacher Prep or in the K-12 Setting

This is great stuff (see my earlier review here).

Every mathematician and math teacher in the universe should read this. So, if any of you happen to be well connected to the math world, please pass this along.

The post Robin Pemantle’s updated bag of tricks for math teaching! appeared first on Statistical Modeling, Causal Inference, and Social Science.

What to do when you read a paper and it’s full of errors and the author won’t share the data or be open about the analysis?

Someone writes:

I would like to ask you for an advice regarding obtaining data for reanalysis purposes from an author who has multiple papers with statistical errors and doesn’t want to share the data.

Recently, I reviewed a paper that included numbers that had some of the reported statistics that were mathematically impossible. As the first author of that paper wrote another paper in the past with one of my collaborators, I have checked their paper and also found multiple errors (GRIM, DF, inappropriate statistical tests, etc.). I have enquired my collaborator about it and she followed up with the first author who has done the analysis and said that he agreed to write an erratum.

Independently, I have checked further 3 papers from that author and all of them had a number of errors, which sheer number is comparable to what was found in Wansink’s case. At that stage I have contacted the first author of these papers asking him about the data for reanalysis purposes. As the email was unanswered, after 2 weeks I have followed up mentioning this time that I have found a number of errors in these papers and included his lab’s contact email address. This time I received a response swiftly and was told that these papers were peer-reviewed so if there were any errors they would have been caught (sic!), that for privacy reasons the data cannot be shared with me and I was asked to send a list of errors that I found. In my response I sent the list of errors and emphasized the importance of independent reanalysis and pointed out that the data comes from lab experiments and any personally identifiable information can be removed as it is not needed for reanalysis. After 3 weeks of waiting, and another email sent in the meantime, the author wrote that he is busy, but had time to check the analysis of one of the papers. In his response, he said that some of the mathematically impossible DFs were wrongly copied numbers, while the inconsistent statistics were due to wrong cells in the excel file selected that supposedly don’t change much. Moreover, he blamed the reviewers for not catching these mistypes (sic!) and said that he found the errors only after I contacted him. The problem is that it is the same paper for which my collaborator said that they checked the results already, so he must have been aware of these problems even before my initial email (I didn’t mention that I know that collaborator).

So here is my dilemma how to proceed. Considering that there are multiple errors, of multiple types across multiple papers it is really hard to trust anything else reported in them. The author clearly does not intend to share the data with me so I cannot verify if the data exists at all. If it doesn’t, as I have sent him the list of errors, he could reverse engineer what tools I have used and come up with numbers that will pass the tests that can be done based solely on the reported statistics.

As you may have more experience dealing with such situations, I thought that I may ask you for an advice how to proceed. Would you suggest contacting the involved publishers, going public or something else?

My reply:

I hate to say it, but your best option here might be to give up. The kind of people who lie and cheat about their published work may also play dirty in other ways. So is it really worth it to tangle with these people? I have no idea about your particular case and am just speaking on general principles here.

You could try contacting the journal editor. Some journal editors really don’t like to find out that they’ve published erroneous work; others would prefer to sweep any such problems under the rug, either because they have personal connections to the offenders or just because they don’t want to deal with cheaters, as this is unpleasant.

Remember: journal editing is a volunteer job, and people sign up for it because they want to publish exciting new work, or maybe because they enjoy the power trip, or maybe out of a sense of duty—but, in any case, they typically aren’t in it for the controversy. So, if you do get a journal editor who can help on this, great, but don’t be surprised if the editors slink away from the problem, for example by putting the burden in your lap by saying that your only option is to submit your critique in the form of an article for the journal, which can then be sent to the author of the original paper for review, and then rejected on the grounds that it’s not important enough to publish.

Maybe you could get Retraction Watch to write something on this dude?

Also is the paper listed on PubPeer? If so, you could comment there.

The post What to do when you read a paper and it’s full of errors and the author won’t share the data or be open about the analysis? appeared first on Statistical Modeling, Causal Inference, and Social Science.

“Principles of posterior visualization”

What better way to start the new year than with a discussion of statistical graphics.

Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles:

Principle 1: Uncertainty should be visualized

Principle 2: Visualization of variability ≠ Visualization of uncertainty

Principle 3: Equal probability = Equal ink

Principle 4: Do not overemphasize the point estimate

Principle 5: Certain estimates should be emphasized over uncertain

And this caution:

These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

And this is not just empty talk. Shubin demonstrates all these points with clear graphs.

Interesting how this complements our methods for visualization in Bayesian workflow.

The post “Principles of posterior visualization” appeared first on Statistical Modeling, Causal Inference, and Social Science.