Watkins, R., Leigh, D., & Gelman, S. A.. (2019, November 26). Parsing Science – Extraordinary Claims, Ordinary Evidence. figshare. https://doi.org/10.6084/m9.figshare.11295614
Ryan Watkins: This is Parsing Science: the unpublished stories behind the world’s most compelling science as told by the researchers themselves. I’m Ryan Watkins.
Doug Leigh: And I’m Doug Leigh. Today, in episode 63 of Parsing Science, we’re joined by Susan Gelman from the University of Michigan’s Department of Psychology. She’ll talk with us about her research into how the use of bold and broad language in scientific papers can make lay readers more likely to believe that those study’s findings are more important and generalizable than ones which make nuanced claims. Here’s Susan Gelman.
Gelman: Hi, I’m Susan Gelman. I’m originally from the suburbs of Philadelphia. I went to college at Oberlin, Ohio where I studied psychology and classical Greek. Then I went to Stanford for my PhD. My PhD was in psychology and I also studied linguistics. And I was very fortunate that my first job after getting my degree was at the University of Michigan, which was my dream job. And I’ve been here ever since. But, I remember, you know, walking across campus of one of my colleagues who was much older than me said, “Oh, you know, it’s so funny when I first came here I thought I’ll be here for a couple of years and then move back to California. And I’ve been here for 40 years.” And I remembered thinking “Well, that’s not going to be me.” Yet, here I am.
Leigh: One of the first academic compendiums on the use of generic language was published in 1995. Called The Generic Book, it asserts that there are two distinct phenomena we referred to as being generic. The first is in reference to kind, such as of a genus of plant or animal. For example the statement, “Potatoes were first introduced into Ireland in the 17th century,” doesn’t refer to any particular type of potato, but rather to the general class of potatoes. The second way we consider something generic is through statements that report regularities that summarize groups of particular episodes or facts. For example, “John smokes a cigar after dinner,” doesn’t necessarily imply the John always smokes a cigar after dinner, but rather reports a habit the generalizes over typical events. After discussing The Generic Book a bit, we asked Susan why we use generic language in everyday life, as well as how doing so can be either helpful to communication and understanding, or counterproductive to it.
Generic language in everyday life
Gelman: We use generics all the time. You know, you say “dogs are four-legged,” “the lion is a ferocious beast.” You know, “an oak tree has acorns.” We’re using them constantly. So, parents are using them with children, children are using them when they talk to other people, we use them in conversation with one another. And you could have the sense that, “Well, that’s just ordinary, you know? We make generalizations all the time, so that just seems sort of simple. Nothing much to it.” But they’re complicated. And there’s not any one thing that they mean.
So, there are certain examples that linguists like to return to. Like, we say “birds lay eggs,” but actually only mature female birds lay eggs. So that means at least 50% of birds are not laying eggs. We can say “birds lay eggs,” but we can’t say “birds are female.” That doesn’t sound like a good generalization. So why is it that we can say “birds lay eggs” but not “birds are female”? Or we can say “sharks attack swimmers.” But if you think about it very few sharks do that. [There] must be a tiny proportion of the sharks that exist that attack swimmers, but somehow it seems reasonable to make a generalization like that. And yet other rare properties don’t make good generalizations. Like, you wouldn’t say “people earn PhDs.” It just doesn’t seem like a very good generalization, even though it’s probably more likely that a person earns a PhD than that a shark attacks a swimmer. So, there’s something about which properties are sort of worthy of a generic that people have speculated about, but no one’s really quite solved that puzzle.
[ Back to topics ]
Why we use generic language
Watkins: Susan’s interest is in cognitive development, particularly with how we come to understand categories through the use of language. So, Doug and I were interested in learning how and why we sometimes use generic rather than specific language, as well as how this applies to the depiction of scientific findings.
Gelman: My own research is focused on how kids form categories, and I’ve been especially interested in how kids sort of over-do categories. You know, treat categories too seriously; think that category boundaries are stricter than they are. There are ways in which kids think of categories as having this “fixed reality” that goes beyond the messy complexity of the world. You know, kids think about gender, for example, in these very strict ways. People call it “essentialism.” Like, kids think that boys are a certain way, and girls are a certain way, and boys and girls have a different essence. And from an acquisition standpoint – like how kids learn generics – that’s puzzling too. Because we usually think that it’s going to be easiest for kids to learn a word if you can show them directly what the word refers to. So, if I want to teach a baby the word “rattle” I might pick up a rattle, and show it to them, and point to it and say, “This is a rattle.” But you can’t pick up and point to categories, because they’re not there in the world. They’re abstractions.
You know, the category of “dogs” is not something I can just point to. I can point to individual dogs, but I can’t ever show you the category of dogs. And yet somehow kids pick this up. They figure out the meanings of generics. So, it seems to be something that humans are very good at reasoning about. And every language that’s been studied has a way of expressing these generic concepts. But every language does it differently, and so kids have to figure that out, too. So, I became interested in this idea that as scientists we can’t completely escape the way our own minds work. This sort of essentialist bias that we see in children is also one that people have documented in adults. And it was one that I saw in myself. And it was one that I thought may be in other aspects of how adults’ reason, including scientists.
[ Back to topics ]
Impetus for the study
Leigh: In their article, Susan and her team discussed the results of several studies they carried out into the use of generic language in published research papers. The first of these analyzed authors’ use of generics in the published titles, research highlights, and abstracts in 1,149 psychological articles published across 11 journals in 2015 and 2016. We were curious what led her to explore this topic, as well as what she predicted they might find.
Gelman: Our initial idea was, “Okay we’ll sample different publications in psychology, and then we’ll see how people evaluate write-ups that do or don’t contain these kinds of generalizations using generic language.” The first study was this text analysis of articles published in psychology. And I will say we chose psychology because, first of all, it’s our field so we’re interested in it. But also, you know, psychology is a field that depends very heavily on sampling. So, it seemed like the right place to start. Because this is where you see that sort of tension between, on the one hand, wanting to make general claims that will be important to the world, but on the other hand having to always choose samples that are not your whole population.
And we were also very interested in units of different sizes, and how that might affect how people write. And this was based partly on intuition, I guess I would say. So, we thought that if you had all the space in the world to talk about your findings you could talk about limitations of your data. You could you could talk about the complexities of your data. You could talk about exceptions. But if you’re forced to kind of boil everything down to just a few words, you may need to gloss over variation. But then there has been this move in recent years to have not only an abstract – which, you know, abstracts are short they’re usually anywhere from, like, 120 to 250 words – so you take, you know, all the work you did in this paper and you condense it down to that. But then also there’s been this move to have “research highlights,” and this is where you boil down your abstract to even less. Much shorter; it would be three to five bullet points, and then each bullet point is very condensed. A typical kind of constraint would be, like, no more than 85 characters including spaces. You know, I’ve written these things and you have to just, like, throw out words wherever possible just to get an idea down to that short bullet point. And then titles, of course, are the ultimate condensation of a scientific project into just one line. So, we thought that maybe there would be more generic so shorter the element would be.
[ Back to topics ]
Bare, Framed, and Hedged generics
Watkins: Susan and her colleagues define “generics” as being general, timeless claims regarding categories or abstract or idealized concepts. They classified the language used in published research studies as being non-generic, or as using one or more types of generic language, which they coded as a “bare,” “framed,” or “hedged” generic claim. We asked Susan to provide examples of each of these, beginning with bare generic sentences.
Gelman: Okay, here’s one: “Adolescent earthquake survivors show increased prefrontal cortex activation to masked earthquake images as adults.” So, it’s just this unqualified claim about adolescent earthquake survivors. Like, they’re a category and so kind of as a group this is what they do. But the framed and the hedged were both ones that were linking the generic claim to the study that was done. So, for example, “The present study found that adolescent earthquake survivors,” etc. Or “We show that infants make inferences about” blah, blah, blah. So, it’s still is making that claim, but it’s saying that it follows from the study that was done.
But the weakest were the hedged generics, where instead of saying, you know, “The present study found,” or “The present study shows,” it would be “These results suggest that leaders emerge because,” blah, blah, blah. Or, “Sleep appears to selectively affect the brain,” et cetera. So, if they said , like, “it suggests,” “it seems to,” “it may” … those were ones that we took as hedged. They’re still talking about the category. They’re still putting it in present tense, as if it’s a timeless phenomenon. But it’s putting a little bit of uncertainty around that claim.
You know, that’s great, you’re being careful. But I don’t think that’s being careful enough, because what follows that hedge is still a generic claim that tells you nothing about the variability in the sample. That doesn’t situate it within a time and place. It’s saying, “I’m gonna hedge a little bit.” But the bottom line is this broad claim about the universe that, you know … you have no reason to sort of question whether this is the right category to generalize over you’re not talking about the variation. It has that same problem if you will that even the bare generic has. It’s maybe just, like, you know, it’s a little more careful.
[ Back to topics ]
The frequency of generics are in scientific publications
Leigh: So just how common was the use of generic language in published scientific research? Make your guess now, and you’ll find out the answer after this short break.
ad: SciencePods
Leigh: Here again is Susan Gelman.
Gelman: So, we looked at eleven journals. We had five subtypes of psychology, and two journals for each. And they all had to be ones that included not just abstracts, but also, like, either highlights or some other smaller unit … a significance statement, something like that. Plus PNAS, the Proceedings of the National Academy of Sciences, because it’s a very high-impact journal that gets read a lot, and we thought that would be good to include. And we had a little bit more than a thousand articles. And we took everything that was a research finding, and then we coded it for how they talked about the research finding. And, specifically, was it with generic language or not with generic language? So, we had about 14,000 elements that got coded.
And I initially said that we were hoping that we could see kind of the variation in terms of when people did and didn’t use generics. But our first surprise was that 89% of the articles had at least one generic in reporting research findings. They would make these broad claims about a category as a whole – not situate it in terms of a particular set of findings from the study – but just a broad claim about the way the world is. And that, to us, was a very high level of generics. We also did find that the smaller the unit the more generics there were. So, titles were the most, then highlights, then abstracts. Although, I will say, most of the titles were not codable. So, we couldn’t code it if it wasn’t a sentence and most titles are not sentences. But of the titles that were sentences the vast majority of them were generic.
[ Back to topics ]
Generic doozies
Watkins: Susan and her team’s datasets are available online. We’ve included a link to them at parsingscience.org/e63. Before our conversation with her, Doug and I had fun reading through the generic language they found in academic research. So, we were curious to learn if she had any particular favorites among the nearly 15,000 fragments that they coded.
Gelman: I guess one of my favorite was when they would say “the brain,” like there’s one brain and it is the human brain. That was not unusual, actually. But okay, here are some of the categories that people wrote. These are well-regarded scientists. These were all, by the way, highly regarded. We weren’t by any means scraping the bottom of the barrel. We went for as high impact as we could that met our constraints in terms of having, you know, the highlights and so forth and the different subfields. So, these are well regarded scientists writing about important findings. The papers were peer reviewed.
You know, this is good work, and just the most general claims about: “people,” “women,” “children,” “adults,” “people with schizophrenia,” “self-promoters,” “early bilinguals,” “the brain” – okay there’s my favorite one – “the human orbital frontal cortex,” “statistical learning.” Yeah, so a lot of them were about sort of psychological constructs, not necessarily groups of people. “Mortality salience,” “parental warmth,” “social exclusion,” “zero-sum beliefs,” “emotion regulation,” “effortful control,” “human decision-making.” That was just, like, a small sample.
And if you think about it, like, you write a paper [and] you have, you know, your sample whatever size it may be. But I think, on average, these were on the order of, you know, at most hundreds of participants. And then your conclusions are about “people.” See, that’s the beauty of generics: you don’t have to say “all.” And, in fact, you have plausible deniability. Because if someone says “Well, all people?,” you know, “blah, blah, blah.” You can say, you know, “I didn’t say all people. I just said ‘people.'” So, you know, they’re remarkably resistant to counter evidence. It would almost be better if people said “all.” Because then you have a counter-example, and then it’s like, “See? That’s not true.” So, generics are not really falsifiable. Now, of course, I will say, here I am criticizing this way of writing, but one of my own papers was in the dataset, and sure enough it had a generic in it. You know, that was … chastening.
[ Back to topics ]
Non-scientists’ reading of generic scientific claims
Leigh: While their first study highlighted the pervasiveness of generic language in published research articles, Susan and her team conducted three other studies to determine if and how generic versus non-generic language influences non-scientists’ interpretation of the importance and universality of researchers claims.
Gelman: [In] study two, we wanted to know what people thought about these sentences, and in particular did it affect how they think about a research summary if it was stated in generic language? So, we took titles from each of the different domains; the different subfields of psychology. We took a subset of the titles, but a lot of them are really sort of difficult to understand for a layperson. So, we calculated the reading level, and then we simply manipulated the form in which these sentences were put. So, some of them were bare generics, but then we also manipulated them so they would have either a framed generic, or a hedged generic, or a non-generic.
I will say these were very subtle manipulations. So, like, you could take the exact same sentence and … Well, in fact, what we did was if, you know, the exact same sentence in present tense could be generic if we just made it past tense: that made it non-generic. And then we asked people to rate them in terms of how important they are, how generalizable the findings are, what they thought the sample size would be, and how much they thought the finding would generalize to people from diverse backgrounds. And, yeah, it was a tiny effect. I mean it was consistent. There was something that people were picking up on if a research finding was stated generically. It was oh-so-slightly but consistently more important in people’s judgments.
[ Back to topics ]
Experimenting with broader linguistic cues
Watkins: The second study showed that lay readers deemed scientific communications as being more important if genericized claims were made about their findings, rather than if they use non-generic language. But the size of this effect was rather small, so Susan and her team carried out additional experiments to investigate the use of a broader range of linguistic use, as she describes next.
Gelman: There’s something about that generic statement that just tweaks it a bit to indicate to these lay readers that there’s something more important going on. So, we thought, often if somebody wants to really indicate that something’s non-generic, they’re not going to simply change a verb from present tense to past tense and leave it at that. But they’ll really, like, give you more indications that they don’t want you to generalize this too far. So we systematically manipulated across these four studies – they’re lumped under study three, but there were actually four experiments – to see what would happen if people got not just this very subtle from is-to-was, or, you know, whatever the verb tense might be, but other indicators as well.
One of the indicators that we varied was changing the subject of the sentence, so instead of it being, for example, “people” it would be “some people.” That, like, explicitly marks that we’re not talking about everybody and, likewise, one of our indicators was qualifying a non-generic. So, for example, “under certain circumstances. ” So again, showing that this is not a universal claim: it’s under certain circumstances this is the case. Which, by the way, so many findings in psychology would accurately be characterized as “under certain circumstances.” But when you put in multiple indicators – you know, like past tense plus “some,” or past tense plus that qualifier, or, you know, all three well – then we got bigger effects. Then people were even more likely to judge the generic is more important than the non-generic claims.
But, I mean, there’s sort of a worrisome side of that to me, which is someone might think they’re being responsible by framing something is non-generic by putting it in the past. So if you, you know, said “People with schizophrenia performed worse on this task,” say. You might think “Well, that’s good. I’m not making a generic claim that’s universalizing over time and space, because I’ve put it in the past tense.” But our data suggests that people reading that are going to treat that almost the same as a generic. So, you have to be more explicit about it to really get people’s attention that you’re not saying that this is broadly true.
[ Back to topics ]
Why we favor generic claims over nuance
Leigh: Listeners might recall from episode 47 that despite headlines such as, “More screentime for teens linked to ADHD symptoms” our guest Amy Orben found that screen time may be no worse for kids than eating potatoes. She arrived at this conclusion by analyzing the millions of ways that researchers could have possibly chosen to carry out their statistical analyses which, of course, is a very complex undertaking. So, Ryan and I were interested in hearing why Susan thinks that we might be prone to favoring bold and broad depictions of science over more nuanced ones.
Gelman: I think it is hard for people to represent variation and to hold that in mind. And I don’t know if it has to be that way, but it seems to be a principle of cognitive psychology and how we represent concepts. So, you know, people have different models of exactly how it is that we are storing and representing all the complex variation that we come across, but despite these different models it’s been well documented that, you know, you hear a word, it calls up to mind sort of a prototype for that category. When you hear “bird,” generally speaking, it calls to mind, you know, sort of a typical bird. You know, maybe a robin or something of that sort. And so, people are aware of the variation if you have them stop and think and reflect on it, but in the moment they’re simply not thinking of that. So, if you ask them to come up with a sentence with the word “bird” in it, they’ll come up with sentences like, “I saw ten birds outside my window this morning.” And then if you try to replace the word bird with non-typical birds, like penguins or ostriches, it just sounds ridiculous. Like, no one has those for those examples in mind, or they wouldn’t have generated the kinds of sentences that they do. They kind of only work for the prototype, because that’s kind of what gets called to mind. And so, I think it probably takes more effort to think about that variation. And I think we’re also not always very good about thinking about things, like the importance of sample size, you know, how do you pick a representative sample? All of these sampling issues that are so important in the social sciences that we learn, you know, in graduate school are not ways that we just typically think about samples when we go about our daily lives.
[ Back to topics ]
Suggestions for non-generic writing
Watkins: We finished our conversation with Susan by asking if her research has changed her own scientific writing, as well as what suggestions she might have for us to be better readers and writers ourselves.
Gelman: Since doing this set of studies I am much more conscious of the language that I use. I mean, it’s funny because I’ve been studying generics now for 20 years, but I didn’t think about generics in my own writing until, you know, Maureen and I sat down and kind of mapped out some of the issues. And Maureen, and Jasmine, and Graciela, and I figured out, you know, how to study these. So, my consciousness has certainly been raised, but I don’t feel like I have the answer fully because I’m not sure that it’s realistic to say that we shouldn’t generalize, right? I mean, we do the science because we want to be able to make predictions about the future and to discover general truths. Maybe it’s not going to be something that’s universally true of people for all time, but certainly we don’t want it to only be true of the hundred people in this experiment. It’s appropriate to generalize, but it may be … maybe these sorts of generalizations are more appropriate when reviewing a body of research.
You know, another practice that we talked about in the paper and we think is potentially very fruitful is called “constraints on generality.” So, this was not our own work; this was Simons, Shoda & Lindsay. They wrote a paper in 2017 called constraints on generality and it was a proposal for what people should add to their empirical papers, and it actually really nicely dovetails with sort of where we landed based on our work. Which is to add a section at the end of every paper where you lay out, “Here’s how I think these data will generalize. Here’s where I think maybe they won’t.” Just to be explicit about the assumptions. It’s an opportunity to talk about the ways in which the work is constrained, how the generality is constrained. So, for universal generalization not to be the default, which I think it is currently.
[ Back to topics ]
Links to article, bonus audio and other materials
Leigh: That was Susan Gelman, discussing her article “Generic language in scientific communication” which she published with Jasmine DeJesus, Maureen Callanan, and Graciela Solis on September 10th, 2019 in the Proceedings of the National Academy of Sciences. You’ll find a link to their paper at parsingscience.org/e63, along with bonus audio and other materials we discussed during the episode.
Watkins: Parsing Science also tweets news about the latest developments in science every day, including many brought to our attention by listeners like you. Follow us @parsingscience and the next time you spot a science story that fascinates you, let us know, and we might just feature the researchers in a future episode.
[ Back to topics ]
Preview of next episode
Leigh: Next time, in episode 64 of Parsing Science, we’ll be joined by Mateus Renno Santos from the Department of Criminology at the University of South Florida. He’ll discuss his research into how an aging population may be the driving force behind the reduction in homicide that the countries in North America, Europe Asia, and Oceania have enjoyed over the past three decades.
Mateus Renno Santos: Changing demographics, they take decades to unfold. So, if you just have a five-year window of data, or if you just have ten years of data, you might not be able to see enough variance. You may not be able to see enough variation in the age composition. And if you cannot see variation, you cannot see the covariance: the variation in the homicide trend. So, you may not be able to see a relationship that actually exists because you’re looking too closely.
Leigh: We hope that you’ll join us again.
[ Back to topics ]
Next time, in episode 64 of Parsing Science, we’ll be joined by Mateus Renno Santos from the Department of Criminology at the University of South Florida. He’ll discuss his research into how an aging population may be the driving force behind the reduction in homicide that the countries in North America, Europe Asia, and Oceania have enjoyed over the past three decades.@rwatkins says:
We finished our conversation with Susan by asking if her research has changed her own scientific writing, as well as what suggestions she might have for us to be better readers and writers ourselves.@rwatkins says:
Listeners might recall from episode 47 that despite headlines such as, “More screentime for teens linked to ADHD symptoms” our guest Amy Orben found that screen time may be no worse for kids than eating potatoes. She arrived at this conclusion by analyzing the millions of ways that researchers could have possibly chosen to carry out their statistical analyses which, of course, is a very complex undertaking. So, Ryan and I were interested in hearing why Susan thinks that we might be prone to favoring bold and broad depictions of science over more nuanced ones.@rwatkins says:
The second study showed that lay readers deemed scientific communications as being more important if genericized claims were made about their findings, rather than if they use non-generic language. But the size of this effect was rather small, so Susan and her team carried out additional experiments to investigate the use of a broader range of linguistic use, as she describes next.@rwatkins says:
While their first study highlighted the pervasiveness of generic language in published research articles, Susan and her team conducted three other studies to determine if and how generic versus non-generic language influences non-scientists’ interpretation of the importance and universality of researchers claims.@rwatkins says:
Susan and her team’s datasets are available online. We’ve included a link to them at parsingscience.org/e63. Before our conversation with her, Doug and I had fun reading through the generic language they found in academic research. So, we were curious to learn if she had any particular favorites among the nearly 15,000 fragments that they coded.@rwatkins says:
So just how common was the use of generic language in published scientific research. Make your guess now, and you’ll find out the answer after this short break.@rwatkins says:
Susan and her colleagues define “generics” as being general, timeless claims regarding categories or abstract or idealized concepts. They classified the language used in published research studies as being non-generic, or as using one or more types of generic language, which they coded as a “bare,” “framed,” or “hedged” generic claim. We asked Susan to provide examples of each of these, beginning with bare generic sentences.@rwatkins says:
In their article, Susan and her team discussed the results of several studies they carried out into the use of generic language in published research papers. The first of these analyzed authors’ use of generics in the published titles research highlights and abstracts in 1,149 psychological articles published across 11 journals in 2015 and 2016. We were curious what led her to explore this topic, as well as what she predicted they might find.@rwatkins says:
Susan’s interest is in cognitive development, particularly with how we come to understand categories through the use of language. So, Doug and I were interested in learning how and why we sometimes use generic rather than specific language, as well as how this applies to the depiction of scientific findings.@rwatkins says:
One of the first academic compendiums on the use of generic language was published in 1995. Called The Generic Book, it asserts that there are two distinct phenomena we referred to as being generic. The first is in reference to kind, such as of a genus of plant or animal. For example the statement, “potatoes were first introduced into Ireland in the 17th century,” doesn’t refer to any particular type of potato, but rather to the general class of potatoes. The second way we consider something generic is through statements that report regularities that summarize groups of particular episodes or facts. For example, “John smokes a cigar after dinner,” doesn’t necessarily imply the John always smokes a cigar after dinner, but rather reports a habit the generalizes over typical events. After discussing The Generic Book a bit, we asked Susan why we use generic language in everyday life, as well as how doing so can be either helpful to communication and understanding, or counterproductive to it.