Melanie Mitchell says, “As someone who has worked in A.I. for decades, I’ve witnessed the failure of similar predictions of imminent human-level A.I., and I’m certain these latest forecasts will fall short as well. “

Melanie Mitchell‘s piece, Artificial Intelligence Hits the Barrier of Meaning (NY Times behind limited paywall), is spot-on regarding the hype surrounding the current A.I. boom. It’s soon to come out in book length from FSG, so I suspect I’ll hear about it again in the New Yorker.

Like Professor Mitchell, I started my Ph.D. at the tail end of the first A.I. revolution. Remember, the one based on rule-based expert systems? I went to Edinburgh to study linguistics and natural language processing because it was strong in A.I., computer science theory, linguistics, and cognitive science.

On which natural language tasks can computers outperform or match humans? Search is good, because computers are fast and it’s a task at which humans aren’t so hot. That includes things like speech-based call routing in heterogeneous call centers (something I worked on at Bell Labs).

Then there’s spell checking. That’s fantastic. It leverages simple statistics about word frequency and typos/brainos and is way better than most humans at spelling. It’s the same algorithms that are used for speech recognition and RNA-seq alignment to the genome. These all sprung out of Claude Shannon’s 1948 paper, “A Mathematical Theory of Communication”, which has over 100K citations. It introduced, among other things, n-gram language models at the character and word level (still used for speech recognition and classification today with different estimators). As far as I know that paper contained the first posterior predictive checks—generating examples from the trained language models and comparing them to real language. David McKay’s info theory book (the only ML book I actually like) is a great introduction to this material and even BDA3 added a spell-checking example. But it’s hardly A.I. in the big “I” sense of “A.I.”.

Speech recognition has made tremendous strides (I worked on it at Bell Labs in the late 90s then at SpeechWorks in the early 00s), but its performance is still so far short of human levels as to make the difference more qualitative than quantitative, a point Mitchell makes in her essay. It would no more fool you into thinking it was human than an animatronic Disney character bolted to the floor. Unlike games like chess or go, it’s going to be hard to do better than people at language, but it would certainly be possible. But it would be hard to do that the same way they built, say Deep Blue, the IBM chess-playing hardware that evaluated so many gazillions of board positions per turn with very clever heuristics to prune search. That didn’t play chess like a human. If the better language was like that, humans wouldn’t understand it. IBM Watson (natural language Jeopardy playing computer) was closer to behaving like humans with its chain of associative reasoning—to me, that’s the closest we’ve gotten to something I’d call “A.I.”. It’s a shame IBM’s oversold it since then.

Human-level general purpose A.I. is going to be an incredibly tough nut to crack. I don’t see any reason it’s an unsurmounable goal. It’s not going to happen in a decade without a major breakthrough. Better classifiers just aren’t enough. People are very clever, insanely good at subtle chains of associative reasoning (though not so great at logic) and learning from limited examples (Andrew’s sister Susan Gelman, a professor at Michigan, studies concept learning by example). We’re also very contextually aware and focused, which allows us to go deep, but can cause us to miss the forest for the trees.

The post Melanie Mitchell says, “As someone who has worked in A.I. for decades, I’ve witnessed the failure of similar predictions of imminent human-level A.I., and I’m certain these latest forecasts will fall short as well. “ appeared first on Statistical Modeling, Causal Inference, and Social Science.

Postdocs and Research fellows for combining probabilistic programming, simulators and interactive AI

Here’s a great opportunity for those interested in probabilistic programming and workflows for Bayesian data analysis:

We (including me, Aki) are looking for outstanding postdoctoral researchers and research fellows to work for a new exciting project in the crossroads of probabilistic programming, simulator-based inference and user interfaces. You will have an opportunity to work with top research groups in Finnish Center for Artificial Intelligence, including both Aalto University and at the University of Helsinki and to cooperate with several industry partners.

The topics for which we are recruiting are

  • Machine learning for simulator-based inference
  • Intelligent user interfaces and techniques for interacting with AI
  • Interactive workflow support for probabilistic programming based modeling

Find the full descriptions here

The post Postdocs and Research fellows for combining probabilistic programming, simulators and interactive AI appeared first on Statistical Modeling, Causal Inference, and Social Science.

Cornell prof (but not the pizzagate guy!) has one quick trick to getting 1700 peer reviewed publications on your CV

From the university webpage:

Robert J. Sternberg is Professor of Human Development in the College of Human Ecology at Cornell University. . . . Sternberg is the author of over 1700 refereed publications. . . .

How did he compile over 1700 refereed publications? Nick Brown tells the story:

I [Brown] was recently contacted by Brendan O’Connor, a graduate student at the University of Leicester, who had noticed that some of the text in Dr. Sternberg’s many articles and chapters appeared to be almost identical. . . .

Exhibit 1 . . . this 2010 article by Dr. Sternberg was basically a mashup of this article of his from the same year and this book chapter of his from 2002. One of the very few meaningful differences in the chunks that were recycled between the two 2010 articles is that the term “school psychology” is used in the mashup article to replace “cognitive education” from the other; this may perhaps not be unrelated to the fact that the former was published in School Psychology International (SPI) and the latter in the Journal of Cognitive Education and Psychology (JCEP). If you want to see just how much of the SPI article was recycled from the other two sources, have a look at this. Yellow highlighted text is copied verbatim from the 2002 chapter, green from the JCEP article. You can see that about 95% of the text is in one or the other colour . . .

Brown remarks:

Curiously, despite Dr. Sternberg’s considerable appetite for self-citation (there are 26 citations of his own chapters or articles, plus 1 of a chapter in a book that he edited, in the JCEP article; 25 plus 5 in the SPI article), neither of the 2010 articles cites the other, even as “in press” or “manuscript under review”; nor does either of them cite the 2002 book chapter. If previously published work is so good that you want to copy big chunks from it, why would you not also cite it?

Hmmmmm . . . I have an idea! Sternberg wants to increase his citation count. So he cites himself all the time. But he doesn’t want people to know that he publishes essentially the same paper over and over again. So in those cases, he doesn’t cite himself. Cute, huh?

Brown continues:

Exhibit 2

Inspired by Brendan’s discovery, I [Brown] decided to see if I could find any more examples. I downloaded Dr. Sternberg’s CV and selected a couple of articles at random, then spent a few minutes googling some sentences that looked like the kind of generic observations that an author in search of making “efficient” use of his time might want to re-use. On about the third attempt, after less than ten minutes of looking, I found a pair of articles, from 2003 and 2004, by Dr. Sternberg and Dr. Elena Grigorenko, with considerable overlaps in their text. About 60% of the text in the later article (which is about the general school student population) has been recycled from the earlier one (which is about gifted children) . . .

Neither of these articles cites the other, even as “in press” or “manuscript in preparation”.

And there’s more:

Exhibit 3

I [Brown] wondered whether some of the text that was shared between the above pair of articles might have been used in other publications as well. It didn’t take long(*) to find Dr. Sternberg’s contribution (chapter 6) to this 2012 book, in which the vast majority of the text (around 85%, I estimate) has been assembled almost entirely from previous publications: chapter 11 of this 1990 book by Dr. Sternberg (blue), this 1998 chapter by Dr. Janet Davidson and Dr. Sternberg (green), the above-mentioned 2003 article by Dr. Sternberg and Dr. Grigorenko (yellow), and chapter 10 of this 2010 book by Dr. Sternberg, Dr. Linda Jarvin, and Dr. Grigorenko (pink). . . .

Once again, despite the fact that this chapter cites 59 of Dr. Sternberg’s own publications and another 10 chapters by other people in books that he (co-)edited, none of those citations are to the four works that were the source of all the highlighted text in the above illustration.

Now, sometimes one finds book chapters that are based on previous work. In such cases, it is the usual practice to include a note to that effect. And indeed, two chapters (numbered 26 and 27) in that 2012 book edited by Dr. Dawn Flanagan and Dr. Patti Harrison, contain an acknowledgement along the lines of “This chapter is adapted from . Copyright 20xx by . Adapted by permission”. But there is no such disclosure in chapter 6.

Exhibit 4

It appears that Dr. Sternberg has assembled a chapter almost entirely from previous work on more than one occasion. Here’s a recent example of a chapter made principally from his earlier publications. . . .

This chapter cites 50 of Dr. Sternberg’s own publications and another 7 chapters by others in books that he (co-)edited. . . .

However, none of the citations of that book indicate that any of the text taken from it is being correctly quoted, with quote marks (or appropriate indentation) and a page number. The four other books from which the highlighted text was taken were not cited. No disclosure that this chapter has been adapted from previously published material appears in the chapter, or anywhere else in the 2017 book . . .

In the context of a long and thoughtful discussion, James Heathers supplies the rules from the American Psychological Association code of ethics:

And here’s Cornell’s policy:

OK, that’s the policy for Cornell students. Apparently not the policy for faculty.

One more thing

Bobbie Spellman, former editor of the journal Perspectives on Psychological Science, is confident “beyond a reasonable doubt” that Sternberg was not telling the truth when he said that “all papers in Perspectives go out for peer review, including his own introductions and discussions.” Unless, as Spellman puts it, “you believe that ‘peer review’ means asking some folks to read it and then deciding whether or not to take their advice before you approve publication of it.”

So, there you have it. The man is obsessed with citing his own work—except on the occasions when he does a cut-and-paste job, in which case he is suddenly shy about mentioning his other publications. And, as editor, he reportedly says he sends out everything for peer review, but then doesn’t.

P.S. From his (very long) C.V.:

Sternberg, R. J. (2015). Epilogue: Why is ethical behavior challenging? A model of ethical reasoning. In R. J. Sternberg & S. T. Fiske (Eds.), Ethical challenges in the behavioral and brain sciences: Case studies and commentaries (pp. 218-226). New York: Cambridge University Press.

This guy should join up with Bruno Frey and Brad Bushman: the 3 of them would form a very productive Department of Cut and Paste. Department chair? Ed Wegman, of course.

The post Cornell prof (but not the pizzagate guy!) has one quick trick to getting 1700 peer reviewed publications on your CV appeared first on Statistical Modeling, Causal Inference, and Social Science.

“We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis”

Brendan Nyhan and Thomas Zeitzoff write:

The results do not provide clear support for the lack-of control hypothesis. Self-reported feelings of low and high control are positively associated with conspiracy belief in observational data (model 1; p<.05 and p<.01, respectively). We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis. Moreover, our experimental treatment effect estimate for our low-control manipulation is null relative to both the high-control condition (the preregistered hypothesis test) as well as the baseline condition (a RQ) in both the combined (table 2) and individual item results (table B7). Finally, we find no evidence that the association with self-reported feelings of control in model 1 of table 2 or the effect of the control treatments in model 2 are moderated by anti-Western or anti-Jewish attitudes (results available on request). Our expectations are thus not supported.

It is good to see researchers openly express their uncertainty and be clear about the limitations of their data.

The post “We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis” appeared first on Statistical Modeling, Causal Inference, and Social Science.

“Simulations are not scalable but theory is scalable”

Eren Metin Elçi writes:

I just watched this video the value of theory in applied fields (like statistics), it really resonated with my previous research experiences in statistical physics and on the interplay between randomised perfect sampling algorithms and Markov Chain mixing as well as my current perspective on the status quo of deep learning. . . .

So essentially in this post I give more evidence for [the] statements “simulations are not scalable but theory is scalable” and “theory scales” from different disciplines. . . .

The theory of finite size scaling in statistical physics: I devoted quite a significant amount of my PhD and post-doc research to finite size scaling, where I applied and checked the theory of finite size scaling for critical phenomena. In a nutshell, the theory of finite size scaling allows us to study the behaviour and infer properties of physical systems in thermodynamic limits (close to phase transitions) through simulating (sequences) of finite model systems. This is required, since our current computational methods are far from being, and probably will never be, able to simulate real physical systems. . . .

Here comes a question I have been thinking about for a while . . . is there a (universal) theory that can quantify how deep learning models behave on larger problem instances, based on results from sequences of smaller problem instances. As an example, how do we have to adapt a, say, convolutional neural network architecture and its hyperparameters to sequences of larger (unexplored) problem instances (e.g. increasing the resolution of colour fundus images for the diagnosis of diabetic retinopathy, see “Convolutional Neural Networks for Diabetic Retinopathy” [4]) in order to guarantee a fixed precision over the whole sequence of problem instances without the need of ad-hoc and manual adjustments to the architecture and hyperparameters for each new problem instance. A very early approach of a finite size scaling analysis of neural networks (admittedly for a rather simple “architecture”) can be found here [5]. An analogue to this, which just crossed my mind, is the study of Markov chain mixing times . . .

It’s so wonderful to learn about these examples where my work is inspiring young researchers to look at problems in new ways!

The post “Simulations are not scalable but theory is scalable” appeared first on Statistical Modeling, Causal Inference, and Social Science.

Facial feedback: “These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.”

Fritz Strack points us to this article, “When Both the Original Study and Its Failed Replication Are Correct: Feeling Observed Eliminates the Facial-Feedback Effect,” by Tom Noah, Yaacov Schul, and Ruth Mayo, who write:

According to the facial-feedback hypothesis, the facial activity associated with particular emotional expressions can influence people’s affective experiences. Recently, a replication attempt of this effect in 17 laboratories around the world failed to find any support for the effect. We hypothesize that the reason for the failure of replication is that the replication protocol deviated from that of the original experiment in a critical factor. In all of the replication studies, participants were alerted that they would be monitored by a video camera, whereas the participants in the original study were not monitored, observed, or recorded. . . . we replicated the facial-feedback experiment in 2 conditions: one with a video-camera and one without it. The results revealed a significant facial-feedback effect in the absence of a camera, which was eliminated in the camera’s presence. These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.

We’ve discussed the failed replications of facial feedback before, so it seemed worth following up with this new paper that provides an explanation for the failed replication that preserves the original effect.

Here are my thoughts.

1. The experiments in this new paper are preregistered. I haven’t looked at the preregistration plan, but even if not every step was followed exactly, preregistration does seem like a good step.

2. The main finding is the facial feedback worked in the no-camera condition but not in the camera condition:

3. As you can almost see in the graph, the difference between these results is not itself statistically significant—not at the conventional p=0.05 level for a two-sided test. The result has a p-value of 0.102, which the authors describe as “marginally significant in the expected direction . . . . p=.051, one-tailed . . .” Whatever. It is what it is.

4. The authors are playing a dangerous game when it comes to statistical power. From one direction, I’m concerned that the studies are way too noisy: it says that their sample size was chosen “based on an estimate of the effect size of Experiment 1 by Strack et al. (1988),” but for the usual reasons we can expect that to be a huge overestimate of effect size, hence the real study has nothing like 80% power. From the other direction, the authors use low power to explain away non-statistically-significant results (“Although the test . . . was greatly underpowered, the preregistered analysis concerning the interaction . . . was marginally significant . . .”).

5. I’m concerned that the study is too noisy, and I’d prefer a within-person experiment.

6. In their discussion section, the authors write:

Psychology is a cumulative science. As such, no single study can provide the ultimate, final word on any hypothesis or phenomenon. As researchers, we should strive to replicate and/or explicate, and any one study should be considered one step in a long path. In this spirit, let us discuss several possible ways to explain the role that the presence of a camera can have on the facial-feedback effect.

That’s all reasonable. I think the authors should also consider the hypothesis that what they’re seeing is more noise. Their theory could be correct, but another possibility is that they’re chasing another dead end. This sort of thing can happen when you stare really hard at noisy data.

7. The authors write, “These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.” I have no idea, but if this is true, it would definitely be good to know.

8. The treatments are for people to hold a pen in their lips or their teeth in some specified ways. It’s not clear to me why any effects of this treatments (assuming the effects end up being reproducible) should be attributed to facial feedback rather than some other aspect of the treatment such as priming or implicit association. I’m not saying there isn’t facial feedback going on; I just have no idea. I agree with the authors that their results are consistent with the facial-feedback model.

P.S. Strack also points us to this further discussion by E. J. Wagenmakers and Quentin Gronau, which I largely find reasonable, but I disagree with their statement regarding “the urgent need to preregister one’s hypotheses carefully and comprehensively, and then religiously stick to the plan.” Preregistration is fine, and I agree with their statement that generating fake data is a good way to test it out (one can also preregister using alternative data sets, as here), but I hardly see it as “urgent.” It’s just one part of the picture.

The post Facial feedback: “These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

“2010: What happened?” in light of 2018

Back in November 2010 I wrote an article that I still like, attempting to answer the question: “How could the voters have swung so much in two years? And, why didn’t Obama give Americans a better sense of his long-term economic plan in 2009, back when he still had a political mandate?”

My focus was on the economic slump at the time: how it happened, what were the Obama team’s strategies for boosting the economy, and in particular why they Democrats didn’t do more to prime the pump in 2009-2010, when they controlled the presidency and both houses of congress and had every motivation to get the economy moving again.

As I wrote elsewhere, I suspect that, back when Obama was elected in 2008 in the midst of an economic crisis, lots of people thought it was 1932 all over again, but it was really 1930:

Obama’s decisive victory echoed Roosevelt’s in 1932. But history doesn’t really repeat itself. . . With his latest plan of a spending freeze, Obama is being labeled by many liberals as the second coming of Herbert Hoover—another well-meaning technocrat who can’t put together a political coalition to do anything to stop the slide. Conservatives, too, may have switched from thinking of Obama as a scary realigning Roosevelt to viewing him as a Hoover from their own perspective—as a well-meaning fellow who took a stock market crash and made it worse through a series of ill-timed government interventions.

My take on all this in 2010 was that, when they came into office, the Obama team was expecting a recovery in any case (as in this notorious graph) and, if anything, were concerned about reheating the economy too quickly.

My perspective on this is a mix of liberal and conservative perspectives: liberal, or Keynesian, in that I’m accepting the idea that government spending can stimulate the economy and do useful things; conservative in that I’m accepting the idea that there’s some underlying business cycle or reality that governments will find it difficult to avoid. “I was astonished to see the recession in Baghdad, for I had an appointment with him tonight in Samarra.”

I have no deep understanding of macroeconomics, though, so you can think of my musings here as representing a political perspective on economic policy—a perspective that is relevant, given that I’m talking about the actions of politicians.

In any case, a big story of the 2010 election was a feeling that Obama and the Democrats were floundering on the economy, which added some force to the expected “party balancing” in which the out-party gains in congress in the off-year election.

That was then, this is now

Now on to 2018, where the big story is, and has been, the expected swing toward the Democrats (party balancing plus the unpopularity of the president), but where the second biggest story is that, yes, Trump and his party are unpopular, but not as unpopular as he was a couple months ago. And a big part of that story is the booming economy, and a big part of that story is the large and increasing budget deficit, which defies Keynesian and traditional conservative prescriptions (you’re supposed to run a surplus, not a deficit, in boom times).

From that perspective, I wonder if the Republicans’ current pro-cyclical fiscal policy, so different from traditional conservative recommendations, is consistent with a larger pattern in the last two years in which the Republican leadership feels that it’s living on borrowed time. The Democrats received more votes in the last presidential election and are expected to outpoll the Republicans in the upcoming congressional elections too, so they may well feel more pressure to get better economic performance now, both to keep themselves in power by keeping the balls in the air as long as possible, and because if they’re gonna lose power, they want to grab what they can when they can still do it.

In contrast the Democratic leadership in 2008 expected to be in charge for a long time, so (a) they were in no hurry to implement policies that they could do at their leisure, and (b) they just didn’t want to screw things up and lose their permanent majority.

Different perspectives and expectations lead to different strategies.

The post “2010: What happened?” in light of 2018 appeared first on Statistical Modeling, Causal Inference, and Social Science.