Cornell prof (but not the pizzagate guy!) has one quick trick to getting 1700 peer reviewed publications on your CV

From the university webpage:

Robert J. Sternberg is Professor of Human Development in the College of Human Ecology at Cornell University. . . . Sternberg is the author of over 1700 refereed publications. . . .

How did he compile over 1700 refereed publications? Nick Brown tells the story:

I [Brown] was recently contacted by Brendan O’Connor, a graduate student at the University of Leicester, who had noticed that some of the text in Dr. Sternberg’s many articles and chapters appeared to be almost identical. . . .

Exhibit 1 . . . this 2010 article by Dr. Sternberg was basically a mashup of this article of his from the same year and this book chapter of his from 2002. One of the very few meaningful differences in the chunks that were recycled between the two 2010 articles is that the term “school psychology” is used in the mashup article to replace “cognitive education” from the other; this may perhaps not be unrelated to the fact that the former was published in School Psychology International (SPI) and the latter in the Journal of Cognitive Education and Psychology (JCEP). If you want to see just how much of the SPI article was recycled from the other two sources, have a look at this. Yellow highlighted text is copied verbatim from the 2002 chapter, green from the JCEP article. You can see that about 95% of the text is in one or the other colour . . .

Brown remarks:

Curiously, despite Dr. Sternberg’s considerable appetite for self-citation (there are 26 citations of his own chapters or articles, plus 1 of a chapter in a book that he edited, in the JCEP article; 25 plus 5 in the SPI article), neither of the 2010 articles cites the other, even as “in press” or “manuscript under review”; nor does either of them cite the 2002 book chapter. If previously published work is so good that you want to copy big chunks from it, why would you not also cite it?

Hmmmmm . . . I have an idea! Sternberg wants to increase his citation count. So he cites himself all the time. But he doesn’t want people to know that he publishes essentially the same paper over and over again. So in those cases, he doesn’t cite himself. Cute, huh?

Brown continues:

Exhibit 2

Inspired by Brendan’s discovery, I [Brown] decided to see if I could find any more examples. I downloaded Dr. Sternberg’s CV and selected a couple of articles at random, then spent a few minutes googling some sentences that looked like the kind of generic observations that an author in search of making “efficient” use of his time might want to re-use. On about the third attempt, after less than ten minutes of looking, I found a pair of articles, from 2003 and 2004, by Dr. Sternberg and Dr. Elena Grigorenko, with considerable overlaps in their text. About 60% of the text in the later article (which is about the general school student population) has been recycled from the earlier one (which is about gifted children) . . .

Neither of these articles cites the other, even as “in press” or “manuscript in preparation”.

And there’s more:

Exhibit 3

I [Brown] wondered whether some of the text that was shared between the above pair of articles might have been used in other publications as well. It didn’t take long(*) to find Dr. Sternberg’s contribution (chapter 6) to this 2012 book, in which the vast majority of the text (around 85%, I estimate) has been assembled almost entirely from previous publications: chapter 11 of this 1990 book by Dr. Sternberg (blue), this 1998 chapter by Dr. Janet Davidson and Dr. Sternberg (green), the above-mentioned 2003 article by Dr. Sternberg and Dr. Grigorenko (yellow), and chapter 10 of this 2010 book by Dr. Sternberg, Dr. Linda Jarvin, and Dr. Grigorenko (pink). . . .

Once again, despite the fact that this chapter cites 59 of Dr. Sternberg’s own publications and another 10 chapters by other people in books that he (co-)edited, none of those citations are to the four works that were the source of all the highlighted text in the above illustration.

Now, sometimes one finds book chapters that are based on previous work. In such cases, it is the usual practice to include a note to that effect. And indeed, two chapters (numbered 26 and 27) in that 2012 book edited by Dr. Dawn Flanagan and Dr. Patti Harrison, contain an acknowledgement along the lines of “This chapter is adapted from . Copyright 20xx by . Adapted by permission”. But there is no such disclosure in chapter 6.

Exhibit 4

It appears that Dr. Sternberg has assembled a chapter almost entirely from previous work on more than one occasion. Here’s a recent example of a chapter made principally from his earlier publications. . . .

This chapter cites 50 of Dr. Sternberg’s own publications and another 7 chapters by others in books that he (co-)edited. . . .

However, none of the citations of that book indicate that any of the text taken from it is being correctly quoted, with quote marks (or appropriate indentation) and a page number. The four other books from which the highlighted text was taken were not cited. No disclosure that this chapter has been adapted from previously published material appears in the chapter, or anywhere else in the 2017 book . . .

In the context of a long and thoughtful discussion, James Heathers supplies the rules from the American Psychological Association code of ethics:

And here’s Cornell’s policy:

OK, that’s the policy for Cornell students. Apparently not the policy for faculty.

One more thing

Bobbie Spellman, former editor of the journal Perspectives on Psychological Science, is confident “beyond a reasonable doubt” that Sternberg was not telling the truth when he said that “all papers in Perspectives go out for peer review, including his own introductions and discussions.” Unless, as Spellman puts it, “you believe that ‘peer review’ means asking some folks to read it and then deciding whether or not to take their advice before you approve publication of it.”

So, there you have it. The man is obsessed with citing his own work—except on the occasions when he does a cut-and-paste job, in which case he is suddenly shy about mentioning his other publications. And, as editor, he reportedly says he sends out everything for peer review, but then doesn’t.

P.S. From his (very long) C.V.:

Sternberg, R. J. (2015). Epilogue: Why is ethical behavior challenging? A model of ethical reasoning. In R. J. Sternberg & S. T. Fiske (Eds.), Ethical challenges in the behavioral and brain sciences: Case studies and commentaries (pp. 218-226). New York: Cambridge University Press.

This guy should join up with Bruno Frey and Brad Bushman: the 3 of them would form a very productive Department of Cut and Paste. Department chair? Ed Wegman, of course.

The post Cornell prof (but not the pizzagate guy!) has one quick trick to getting 1700 peer reviewed publications on your CV appeared first on Statistical Modeling, Causal Inference, and Social Science.

“We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis”

Brendan Nyhan and Thomas Zeitzoff write:

The results do not provide clear support for the lack-of control hypothesis. Self-reported feelings of low and high control are positively associated with conspiracy belief in observational data (model 1; p<.05 and p<.01, respectively). We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis. Moreover, our experimental treatment effect estimate for our low-control manipulation is null relative to both the high-control condition (the preregistered hypothesis test) as well as the baseline condition (a RQ) in both the combined (table 2) and individual item results (table B7). Finally, we find no evidence that the association with self-reported feelings of control in model 1 of table 2 or the effect of the control treatments in model 2 are moderated by anti-Western or anti-Jewish attitudes (results available on request). Our expectations are thus not supported.

It is good to see researchers openly express their uncertainty and be clear about the limitations of their data.

The post “We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis” appeared first on Statistical Modeling, Causal Inference, and Social Science.

“Simulations are not scalable but theory is scalable”

Eren Metin Elçi writes:

I just watched this video the value of theory in applied fields (like statistics), it really resonated with my previous research experiences in statistical physics and on the interplay between randomised perfect sampling algorithms and Markov Chain mixing as well as my current perspective on the status quo of deep learning. . . .

So essentially in this post I give more evidence for [the] statements “simulations are not scalable but theory is scalable” and “theory scales” from different disciplines. . . .

The theory of finite size scaling in statistical physics: I devoted quite a significant amount of my PhD and post-doc research to finite size scaling, where I applied and checked the theory of finite size scaling for critical phenomena. In a nutshell, the theory of finite size scaling allows us to study the behaviour and infer properties of physical systems in thermodynamic limits (close to phase transitions) through simulating (sequences) of finite model systems. This is required, since our current computational methods are far from being, and probably will never be, able to simulate real physical systems. . . .

Here comes a question I have been thinking about for a while . . . is there a (universal) theory that can quantify how deep learning models behave on larger problem instances, based on results from sequences of smaller problem instances. As an example, how do we have to adapt a, say, convolutional neural network architecture and its hyperparameters to sequences of larger (unexplored) problem instances (e.g. increasing the resolution of colour fundus images for the diagnosis of diabetic retinopathy, see “Convolutional Neural Networks for Diabetic Retinopathy” [4]) in order to guarantee a fixed precision over the whole sequence of problem instances without the need of ad-hoc and manual adjustments to the architecture and hyperparameters for each new problem instance. A very early approach of a finite size scaling analysis of neural networks (admittedly for a rather simple “architecture”) can be found here [5]. An analogue to this, which just crossed my mind, is the study of Markov chain mixing times . . .

It’s so wonderful to learn about these examples where my work is inspiring young researchers to look at problems in new ways!

The post “Simulations are not scalable but theory is scalable” appeared first on Statistical Modeling, Causal Inference, and Social Science.

Facial feedback: “These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.”

Fritz Strack points us to this article, “When Both the Original Study and Its Failed Replication Are Correct: Feeling Observed Eliminates the Facial-Feedback Effect,” by Tom Noah, Yaacov Schul, and Ruth Mayo, who write:

According to the facial-feedback hypothesis, the facial activity associated with particular emotional expressions can influence people’s affective experiences. Recently, a replication attempt of this effect in 17 laboratories around the world failed to find any support for the effect. We hypothesize that the reason for the failure of replication is that the replication protocol deviated from that of the original experiment in a critical factor. In all of the replication studies, participants were alerted that they would be monitored by a video camera, whereas the participants in the original study were not monitored, observed, or recorded. . . . we replicated the facial-feedback experiment in 2 conditions: one with a video-camera and one without it. The results revealed a significant facial-feedback effect in the absence of a camera, which was eliminated in the camera’s presence. These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.

We’ve discussed the failed replications of facial feedback before, so it seemed worth following up with this new paper that provides an explanation for the failed replication that preserves the original effect.

Here are my thoughts.

1. The experiments in this new paper are preregistered. I haven’t looked at the preregistration plan, but even if not every step was followed exactly, preregistration does seem like a good step.

2. The main finding is the facial feedback worked in the no-camera condition but not in the camera condition:

3. As you can almost see in the graph, the difference between these results is not itself statistically significant—not at the conventional p=0.05 level for a two-sided test. The result has a p-value of 0.102, which the authors describe as “marginally significant in the expected direction . . . . p=.051, one-tailed . . .” Whatever. It is what it is.

4. The authors are playing a dangerous game when it comes to statistical power. From one direction, I’m concerned that the studies are way too noisy: it says that their sample size was chosen “based on an estimate of the effect size of Experiment 1 by Strack et al. (1988),” but for the usual reasons we can expect that to be a huge overestimate of effect size, hence the real study has nothing like 80% power. From the other direction, the authors use low power to explain away non-statistically-significant results (“Although the test . . . was greatly underpowered, the preregistered analysis concerning the interaction . . . was marginally significant . . .”).

5. I’m concerned that the study is too noisy, and I’d prefer a within-person experiment.

6. In their discussion section, the authors write:

Psychology is a cumulative science. As such, no single study can provide the ultimate, final word on any hypothesis or phenomenon. As researchers, we should strive to replicate and/or explicate, and any one study should be considered one step in a long path. In this spirit, let us discuss several possible ways to explain the role that the presence of a camera can have on the facial-feedback effect.

That’s all reasonable. I think the authors should also consider the hypothesis that what they’re seeing is more noise. Their theory could be correct, but another possibility is that they’re chasing another dead end. This sort of thing can happen when you stare really hard at noisy data.

7. The authors write, “These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.” I have no idea, but if this is true, it would definitely be good to know.

8. The treatments are for people to hold a pen in their lips or their teeth in some specified ways. It’s not clear to me why any effects of this treatments (assuming the effects end up being reproducible) should be attributed to facial feedback rather than some other aspect of the treatment such as priming or implicit association. I’m not saying there isn’t facial feedback going on; I just have no idea. I agree with the authors that their results are consistent with the facial-feedback model.

P.S. Strack also points us to this further discussion by E. J. Wagenmakers and Quentin Gronau, which I largely find reasonable, but I disagree with their statement regarding “the urgent need to preregister one’s hypotheses carefully and comprehensively, and then religiously stick to the plan.” Preregistration is fine, and I agree with their statement that generating fake data is a good way to test it out (one can also preregister using alternative data sets, as here), but I hardly see it as “urgent.” It’s just one part of the picture.

The post Facial feedback: “These findings suggest that minute differences in the experimental protocol might lead to theoretically meaningful changes in the outcomes.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

“2010: What happened?” in light of 2018

Back in November 2010 I wrote an article that I still like, attempting to answer the question: “How could the voters have swung so much in two years? And, why didn’t Obama give Americans a better sense of his long-term economic plan in 2009, back when he still had a political mandate?”

My focus was on the economic slump at the time: how it happened, what were the Obama team’s strategies for boosting the economy, and in particular why they Democrats didn’t do more to prime the pump in 2009-2010, when they controlled the presidency and both houses of congress and had every motivation to get the economy moving again.

As I wrote elsewhere, I suspect that, back when Obama was elected in 2008 in the midst of an economic crisis, lots of people thought it was 1932 all over again, but it was really 1930:

Obama’s decisive victory echoed Roosevelt’s in 1932. But history doesn’t really repeat itself. . . With his latest plan of a spending freeze, Obama is being labeled by many liberals as the second coming of Herbert Hoover—another well-meaning technocrat who can’t put together a political coalition to do anything to stop the slide. Conservatives, too, may have switched from thinking of Obama as a scary realigning Roosevelt to viewing him as a Hoover from their own perspective—as a well-meaning fellow who took a stock market crash and made it worse through a series of ill-timed government interventions.

My take on all this in 2010 was that, when they came into office, the Obama team was expecting a recovery in any case (as in this notorious graph) and, if anything, were concerned about reheating the economy too quickly.

My perspective on this is a mix of liberal and conservative perspectives: liberal, or Keynesian, in that I’m accepting the idea that government spending can stimulate the economy and do useful things; conservative in that I’m accepting the idea that there’s some underlying business cycle or reality that governments will find it difficult to avoid. “I was astonished to see the recession in Baghdad, for I had an appointment with him tonight in Samarra.”

I have no deep understanding of macroeconomics, though, so you can think of my musings here as representing a political perspective on economic policy—a perspective that is relevant, given that I’m talking about the actions of politicians.

In any case, a big story of the 2010 election was a feeling that Obama and the Democrats were floundering on the economy, which added some force to the expected “party balancing” in which the out-party gains in congress in the off-year election.

That was then, this is now

Now on to 2018, where the big story is, and has been, the expected swing toward the Democrats (party balancing plus the unpopularity of the president), but where the second biggest story is that, yes, Trump and his party are unpopular, but not as unpopular as he was a couple months ago. And a big part of that story is the booming economy, and a big part of that story is the large and increasing budget deficit, which defies Keynesian and traditional conservative prescriptions (you’re supposed to run a surplus, not a deficit, in boom times).

From that perspective, I wonder if the Republicans’ current pro-cyclical fiscal policy, so different from traditional conservative recommendations, is consistent with a larger pattern in the last two years in which the Republican leadership feels that it’s living on borrowed time. The Democrats received more votes in the last presidential election and are expected to outpoll the Republicans in the upcoming congressional elections too, so they may well feel more pressure to get better economic performance now, both to keep themselves in power by keeping the balls in the air as long as possible, and because if they’re gonna lose power, they want to grab what they can when they can still do it.

In contrast the Democratic leadership in 2008 expected to be in charge for a long time, so (a) they were in no hurry to implement policies that they could do at their leisure, and (b) they just didn’t want to screw things up and lose their permanent majority.

Different perspectives and expectations lead to different strategies.

The post “2010: What happened?” in light of 2018 appeared first on Statistical Modeling, Causal Inference, and Social Science.

MRP (or RPP) with non-census variables

It seems to be Mister P week here on the blog . . .

A question came in, someone was doing MRP on a political survey and wanted to adjust for political ideology, which is a variable that they can’t get poststratification data for.

Here’s what I recommended:

If a survey selects on a non-census variable such as political ideology, or if you simply wish to adjust for it because of potential nonresponse bias, my recommendation is to do MRP on all these variables.

It goes like this: suppose y is your outcome of interest, X are the census variables, and z is the additional variable, in this example it is ideology. The idea is to do MRP by fitting a multilevel regression model on y given (X, z), then poststratify based on the distribution of (X, z) in the population. The challenge is that you don’t have (X, z) in the population; you only have X. So what you do to create the poststratification distribution of (X, z) is: first, take the poststratification distribution of X (known from the census); second, estimate the population distribution of z given X (most simply by fitting a multilevel regression of z given X from your survey data, but you can also use auxiliary information if available).

Yu-Sung and I did this a few years ago in our analysis of public opinion for school vouchers, where one of our key poststratification variables was religion, which we really needed to include for our analysis but which is not on the census. To poststratify, we first modeled religion given demographics—we had several religious categories, and I think we fit a series of logistic regressions. We used these estimated conditional distributions to fill out the poststrat table and then went from there. We never wrote this up as a general method, though.

The post MRP (or RPP) with non-census variables appeared first on Statistical Modeling, Causal Inference, and Social Science.

Debate about genetics and school performance

Jag Bhalla points us to this article, “Differences in exam performance between pupils attending selective and non-selective schools mirror the genetic differences between them,” by Emily Smith-Woolley, Jean-Baptiste Pingault, Saskia Selzam, Kaili Rimfeld, Eva Krapohl, Sophie von Stumm, Kathryn Asbury, Philip Dale, Toby Young, Rebecca Allen, Yulia Kovas, and Robert Plomin, along with this response by Eric Turkheimer.

Smith-Wooley et al. find an association of test scores with genetic variables that are also associated with socioeconomic status, and conclude that “genetic and exam differences between school types are primarily due to the heritable characteristics involved in pupil admission.” From the other direction, Turkheimer says, “if the authors think their data support the hypothesis that socioeconomic educational differences are simply the result of pre-existing genetic differences among the students assigned to different schools, that is their right. But . . . the data they report here do nothing to actually make the case in one direction or the other.”

It’s hard for me to evaluate this debate given my lack of background in genetics (Bhalla shares some thoughts here, but I can’t really evaluate these either), but I thought I’d share it with you.

The post Debate about genetics and school performance appeared first on Statistical Modeling, Causal Inference, and Social Science.

Can we do better than using averaged measurements?

Angus Reynolds writes:

Recently a PhD student at my University came to me for some feedback on a paper he is writing about the state of research methods in the Fear Extinction field. Basically you give someone an electric shock repeatedly while they stare at neutral stimuli and then you see what happens when you start showing them the stimuli and don’t shock them anymore. Power will always be a concern here because of the ethical problems.

Most of his paper is commenting on the complete lack of constancy between and within labs in how they analyse data. Plenty of Garden of forking paths, concerns about type 1, type 2 and S and M errors.

One thing I’ve been pushing him is to talk about more is improved measurement.

Currently fear is measured in part by taking skin conductance measurements continuously and then summarising an 8 second or so window between trials into averages, which are then split into blocks and ANOVA’d.

I’ve commented that they must be losing information if they are summarising a continuous (and potentially noisy) measurement over time to 1 value. It seems to me that the variability within that 8 second window would be very important as well. So why not just model the continuous data?

Given that the field could be at least two steps away from where it needs to be (immature data, immature methods), I’ve suggested that he just start by making graphs of the complete data that he would like to be able to model one day and not to really bother with p-value style analyses.

In terms of developing the skills necessary to move forward: would you even bother trying to create models of the fear extinction process using the simplified, averaged data that most researchers use or would it be better to get people accustomed to seeing the continuous data first and then developing more complex models for that later?

My reply:

I actually don’t think it’s so horrible to average the data in this way. Yes, it should be better to model the data directly, and, yes, there has to be some information being lost by the averaging, but empirical variation is itself very variable, so it’s not like you can expect to see lots of additional information by comparing groups based on their observed variances.

I agree 100% with your suggestion of graphing the complete data. Regarding measurement, I think the key is for it to be connected to theory where possible. Also from the above description it sounds like the research is using within-person comparisons, which I generally recommend.

The post Can we do better than using averaged measurements? appeared first on Statistical Modeling, Causal Inference, and Social Science.