The Trump Minimum

Now that Trump has essentially won the GOP nomination, those of us who live in the U.S. have to confront a deeply troubling reality: At some point in the near future, tens of millions of our fellow Americans will have voted in favor of Donald Trump becoming President. This seems destined to go down in history as a fascinating combination of shame and idiocy.1 But just how bad will it be? How many of us are going to officially register our opinion that Donald Trump is not only acceptable but also preferable to another human being as a candidate for the world’s most powerful job?

In this post I attempt to find the Trump Minimum, the absolute best case scenario for this nation as it attempts to live with itself from 2017 on.2 The question: What is the lowest possible number of people who will wind up voting for Trump?

I’m ignoring two possibilities here. First, that Trump won’t run—either he’ll suddenly realize that he might actually have to do the job, and panic and quit; or he’ll run out of money because he has been a secret poor person this whole time; or Paul Ryan will decide that he’d like “history” (one year from now and after) to remember him kindly and will orchestrate a convention coup. In that case Trump will only have his primary votes; these promise to be pretty substantial, already topping 10 million as of this Washington Post article from April. I’m not sure primary votes matter in quite the same way, though. The stakes are just lower in those contests; some people might be voting tactically without hoping for the candidate to win, and others might just be gambling.

The second possibility is that Trump wins. Recent polls do show the field narrowing. Personally, I don’t think this is very likely.3 But if he does win, we’re going to have way more important things to worry about. (Also, I suspect that if he does win we might be more angry at the people who failed to vote against him—who knew he was bad and let it happen anyway. But that’s just a guess.)

My Method

This is all going to be pretty straightforward. I take a margin of victory for the Democrat—I’m just going to say Clinton from here on out, since she’s more likely to win the nomination—along with the third party vote and use that to figure out what percentage of the popular vote Trump receives.4 I’m going to ignore the electoral college as well as demographic breakdowns; both are important and interesting, and the latter affects the overall popular vote, but I really just want big-picture numbers here. I’ll leave the story behind the numbers to the reader’s imagination.

I also estimate overall turnout, which tells me how many people are voting in the first place. I found it pretty tough to find a good source for how many eligible voters there are in the USA, so I used this reliable-seeming Wikipedia page on Voter Turnout in U.S. Presidential Elections, which shows this trend:


That’s not the world’s worst trend line, so I just used it to predict that there will be 240 million eligible voters this year. That feels like kind of a strange method, and definitely not like very good social science, but there were 235 million eligible voters last time, so it seems like as good a guess as any.


We’ll start with a pretty plausible, and therefore slightly depressing scenario. The Real Clear Politics average of polls has the race essentially tied as of this writing, so we’ll use a previous close election as our Sad Baseline: In 1960 JFK defeated Nixon by just .17% in the popular vote. That’s perfect for a sad winning margin.

Since 1932, which is when my data starts for this, turnout has ranged from 49% (in 1996) to 62.8% (in 1960—so not only was the result a dead heat, but tons of people weighed in on it). In the past few elections turnout has hovered in the low to mid 50’s. Since this is the Sad Baseline, we’ll assume decent turnout of 55% (thus boosting the raw numbers of Trump voters). And finally, we’ll assume that third party performance is just under the historical median—call it 2%. All together, this gives us:

Percentage Voters
Clinton 49.09% 64,792,000
Trump 48.92% 64,568,000

That’s over 64 million votes for Donald Trump. Thanks to historic population growth, only Barack Obama has ever received more votes in a real Presidential election (though he did do it twice).

Those assumptions were about as pessimistic as possible—good for figuring out the top of the range, but otherwise counter to the spirit of the Trump Minimum. So let’s consider a Plausible Good Baseline. In 2008 Obama beat McCain by 7.27%. That might be about as strong a margin as you can hope for in an era this partisan, so let’s steal that. Let’s also gamble that the historically bad favorability ratings for both candidates will depress turnout to a tie with its worst level in the last 80 years—49%. Those adjustments to the previous scenario lead you to just over 53 million Trump voters.

We’ve still got one powerful lever, though: Third Parties. These have had a pretty substantial spoiler effect in U.S. History; it was only 24 years ago that Ross Perot pulled 19% of the vote. So far there’s no indication that anyone will come close to matching that, but there’s some evidence that Libertarian candidate Gary Johnson could reach as much as 10% of the vote. If we bump our third party cut up to that, and keep everything else the same, we arrive at a Plausible Gary Scenario wherein Trump receives just 49 million votes.

All of this is well and good, but what if we ratchet the numbers up to a historic disgrace? Is it really so implausible that Trump will do or say something that loses him the votes of 9 out of 10 women? Or that he will quit the race and return to it the next day, on multiple occasions? Or insist that Donald Sterling be his Vice President? In hypothetical times like these, we really ought to turn to the biggest margin of victory in modern U.S. Presidential election history: Warren G. Harding’s 26.17% blowout of James Cox.5

Let’s also start thinking outside the box about voter turnout. Sure, 49% is bad, but what about the examples of other nations? Surely some of them care even less than us. I looked into it and, no, they really don’t, at least not in countries that are doing pretty well.6 Among OECD nations, only Japan, Chile, and Switzerland are more apathetic about voting. Only 40% of Switzerland’s voting age population votes, so let’s just take that as a worst-case-scenario number for U.S. turnout, too.

Finally, we’ll ratchet Gary up to 15%. Why not? After all, in this scenario Trump keeps quitting and promoting a man who was banned from the NBA.

With these parameters, we have reached the Trump Minimum, the absolute lowest number of votes that he could reach if we hit historically plausible extremes. The final count: 28 million votes.

So there you have it. Donald Trump is virtually guaranteed to get at least 28 million votes in November. A minimum of ~30 million people living in this country will choose to place Donald Trump in charge of maintaining the nuclear arsenal, repairing our broken justice system, and engaging in diplomacy with leaders who often are not old white men.

I’ll admit I wanted that number to be lower. I even played around with one last scheme: The Return of Teddy Scenario. In this one Teddy Roosevelt returns from the dead along with his former opponent Eugene Debs, and together they replicate their record-setting third-party performances in the 1912 election to steal 33.4% of the vote. Gary Johnson still runs (nothing has stopped him so far in real life), though he only adds 10% to the third-party haul, since some of his coalition prefers the Bull Moose charisma. Most voters are turned off by the prospect of voting for a reanimated dead person, so turnout dips to 30%. And as a hyper-masculine old-money white man with century-old values, Teddy pulls disproportionately from the Republican vote; Debs does pull some Bernie supporters out of Clinton’s coalition, but her margin still improves to an even 30%. In the Return of Teddy Scenario, which is not likely, Trump receives just 13.3% of the vote, losing to both Clinton and a long dead park enthusiast. But he still pulls 11 million votes.

There’s just no way around it: The man is going to get a lot of votes. Realistically closer to 60 million, but at the very least about 30 million. We’re all going to know someone who voted for him; we’re all going to go into future elections with hard evidence that people who like him are out there voting again. But there’s still the possibility that, like Barry Goldwater or Walter Mondale, he’ll lose badly enough that we can think of ourselves as the country that overwhelmingly rejected him. We just have to hope he’s less like Richard Nixon and more like James Cox—his electoral performance remembered mainly for how thoroughly he was defeated.

1. I used to say this about re-electing George W. Bush, too, so it’s possible that I’m either overreacting or underestimating what’s next. Also, I was originally calling it one of our most shameful and idiotic moments, but then I thought about things that happened before 1950.

2. To be fair, we often don’t seem to have a problem hanging around with evidence of our worst moral failures. Strom Thurmond, who ran for President in 1948 on a pro-Jim Crow platform, only left office in 2003—and he wasn’t even voted out; he just died.

3. Anything can happen, and major party nominees always have a decent enough shot at the White House that it makes sense to worry about the dangerous ones. But speaking purely subjectively, I just don’t see how he overcomes demographic reality. Women and people of color hate him even more than they hated Romney and McCain, who both lost. The theory behind his winning rests on activating enough “missing white voters” (read: racists) to overcome this, but I don’t see how enough of them are A)  still alive, B) eager to vote now, despite having sat out two elections featuring a black candidate.

4. I don’t allocate the third party vote to one party or another, but it shouldn’t really matter for these purposes. If you thought a Libertarian, for instance, would take many more voters from Trump than from Clinton, you could just change the Clinton margin of victory; but I’m just making up that margin in the first place, and why complicate a made up number? We don’t care why people are or aren’t voting for Trump—we just care how many of them there are.

5. I don’t know why the 1920 election in particular. It’s interesting that it’s the first one in which women could vote, and the second biggest margin was in the very next election. Maybe only one party appealed to women voters? Three other things I want to put in this footnote: 1. I’m saying “modern” elections because my data only goes back to the 1820’s. Before that Monroe and Washington essentially ran unopposed, which screws up projects like this one. 2. Kind of funny that the VP on Cox’s ticket was FDR—guess he made up for it later. 3. I thought of Donald Sterling as a joke and was immediately convinced it was plausible. What better choice for Donald than, essentially, himself: a racist real estate billionaire named Donald.

6. I submit this as a replacement phrase for “developed countries”.


The President Was Here

This post uses Most Distinctive Words to analyze what we talk about when we talk about Presidents.*


I begin with the Wikipedia pages for each U.S. President. I downloaded these in January and then got distracted with work, so they’re a few months out of date, but still relatively fresh compared to most of the texts I work on. I wasn’t too strict about what I took; basically I started at the top of the article and stopped when I felt the article was over. Just having this much gives you access to an underrated form of quantitative textual analysis: checking how long things are. Here are the word counts for each President’s article:

President Word Count
LBJ 18485
JFK 17098
Ike 16458
FDR 16334
Lincoln 15765
Reagan 15374
Wilson 15234
Harding 15220
Grant 15107
Teddy 14868
Nixon 14366
W 14200
Washington 13809
Andrew Johnson 13674
McKinley 12988
Ford 12764
Jackson 12007
Carter 11958
Tyler 11944
Truman 11905
Jefferson 11643
Garfield 11555
Pierce 11537
Clinton 11497
Obama 11437
Hoover 11420
Madison 11008
Adams 10836
George H.W. Bush 10832
Cleveland 10060
Taft 9512
Coolidge 9239
Arthur 9162
JQA 8917
Hayes 8906
Ben Harrison 8423
Buchanan 7035
Van Buren 6966
Monroe 6801
WHH 6714
Taylor 6194
Polk 6096
Fillmore 4774

To me this variation appears to have barely any rhyme or reason. LBJ is a solid contender for the top spot; his Presidency is very tough to rank, because it includes both an incredible domestic agenda (Civil Rights Act, Medicare) and arguably the worst foreign policy agenda (Vietnam). But if you take the “absolute value” of everything he did, there’s no denying he’s one of the most consequential Presidents. Fillmore is also a decent contender for last place, with less than a fourth of LBJ’s word count; I think he’s probably high in the running for “most forgotten President”.** But in between, things quickly get strange. Eisenhower ahead of 4-termer FDR? John Tyler ahead of Thomas Jefferson? Harding ahead of Teddy Roosevelt? Monroe near the bottom?

The big lesson here is that these pages are pretty weird artifacts. Their authors will have stylistic tics (maybe Tyler got a verbose guy, and Monroe got an Imagiste), and editorial decisions might displace whole sections into other articles. For example, in Jefferson’s article, the Louisiana Purchase gets about 250 words, but there’s also a standalone article about the Louisiana Purchase that’s about 5,000 words long—i.e., more worthy of discussion than the entire administration and life of Millard Fillmore, according to random Wikipedia editors.

Most Distinctive Words

Still, even with these idiosyncrasies, we ought to be able to extract something interesting from the language of these articles. For instance, which Presidents’ write-ups have the most to do with slavery, or war? What are the most remarked-upon aspects of, say, Teddy’s life, or the founding fathers, or the Gilded Age? What words, if any, set apart the discourse surrounding an icon like Lincoln from that around a tremendous moral failure like Andrew Jackson?

To explore these questions I turned to Most Distinctive Words (MDWs). This is basically a measure of the words that appear more frequently in a given text than we would expect, based on their frequency in some comparison corpus. In my case, that means checking which words appear disproportionately often in one guy’s article, compared to what we’d see if the words were distributed evenly across all articles.*** So, for instance, we might expect to see “atomic” appear distinctively often for Truman, since he dropped more atom bombs than anyone else—and, in fact, “atomic” is a distinctive word for him (though “bombing” gets you Reagan and LBJ as well).

A few notes about the MDWs you’ll see in the rest of this post: To make life easier, I converted everything to lowercase (that way “train” and “Train” aren’t different words, just because one appears at the beginning of a sentence). I also removed stop words (things like “the” and “of”, which are so frequent that they can skew things, and also are often boring), numbers, and symbols. Finally, I took out the ordinarily used names of Presidents (so, “andrew”, “jackson”, and “jacksons”, the latter to catch possessives), because otherwise they dominate the data, since they are naturally very distinctive of their articles.

The System Works

When you check the MDWs for a particular guy, you usually find a pretty nice encapsulation of his Presidency’s Greatest Hits. Here are the top few for Lincoln:†

Lincoln MDWs

You start with his two signature issues, pick up his home states, roll through his political acts and opponents, and even capture his assassin and, three cells later, one after the other, the reason he was killed. Another good example is Andrew Jackson:

Andrew Jackson MDWs

You’ve got his famous battle (“orleans”), his refusal to understand finance (“banks”), and his penchant for genocide—rendered all the more striking when you realize that “creek” refers to the Creek tribe (now called Muscogee), who lost a brutal war against Jackson and years later were also victims of the Indian Removal Act.

Since the MDWs work pretty often, it’s pretty striking when they depart from expectations. For some guys, this means a focus on the pre-Presidency—Madison’s top word is “constitution”, Reagan’s are littered with California and Hollywood terms, and Eisenhower’s focus on war terminology for eight straight words until they arrive at “interstate”, before jumping back to “ii”. Ulysses S. Grant is similar—unsurprising, since his own memoir barely mentions that he was President.

In another case that surprised me a little, the focus is on the post-Presidency:

William Howard Taft MDWs

Taft was the only President who ever went on to become a Supreme Court justice. That’s distinguishing in either sense of the word, and a nice legacy for a guy whose is probably best known to the public for being too fat to get out of a bathtub. (The article I have says that the evidence for this actually happening is unclear, but gives two sources for the distressingly ambiguous sentence “However, he once did overflow a bathtub.” I’m surprised and a little disappointed to say this whole sequence has been removed from the current version of the article.)

Another guy who surprised me was JFK. The word “assassination” is just 12th on his list; but on reflection, this may have something to do with the 8,000 word separate article on it, not to be confused with the 19,000 wordJohn F. Kennedy assassination conspiracy theories” article, which is longer than any Presidential article.††

Rules of Distinction

One feature of MDWs is that they privilege proper nouns. This makes sense when you consider just how specific (i.e., distinct) proper nouns are: all sorts of kids have dogs, but only Oblio has Arrow. This means there are a few things that define you if you get a Wikipedia page:

  • Your home. A President’s home state usually appears in his top few MDWs. If a guy has two home states, they both appear: Lincoln gets Illinois and Kentucky, Obama gets Illinois and Hawaii (and, even higher, Chicago). This isn’t a universal rule (JFK doesn’t have “massachusetts”), but it’s quite common.
  • Your wife. George has Martha, John has Abigail, Abe has Mary, Rutherford has Lucy, Herbert has Lou, Dwight has Mamie, Dick has Pat, Ron has Nancy, Bill has Hillary. You’re known by the person you love. But, there’s also:
  • You enemy. The first word for Washington is “british”; “confederate” makes the top five for Lincoln and Grant; Polk has his “mexico” and Truman his “korea”. Booth, Guiteau, Czolgosz, and Oswald make their expected lists. LBJ has not just “vietnam” but “goldwater”. And look back at the Jackson list above: creek, indian, indians, calhoun, bank, banks, seminole, tribes—that’s eight enemies in just 16 words (and another, “orleans”, is the site of a battle). For everyone, but especially for bloodthirsty maniacs, distinction is conferred by who and what we choose to fight.

Eras, In So Many Words

Another cool option with these MDWs is approaching from the other direction. Once we have them, we can pick a word and see who it encompasses. For instance, take the word “gold”. This turns out to be an MDW for Grant, Hayes, Garfield, Cleveland, Harrison, and McKinley—in other words, every President but one (Arthur) from 1868-1901. This is probably a function of the currency debates that dominated that era (the last three guys also have “silver” as an MDW), but it’s also a nice, very literal way to capture the Gilded Age.

Or take another definitive American word: “slave”. That word and “slaves” appear as MDWs for Washington, Jefferson, Madison, Monroe, John Quincy Adams, and Jackson—six of the first seven Presidents, and all of the ones who owned slaves themselves. (JQA, like his father, didn’t own any slaves, and the two words appear in his article in the context of his fierce opposition to slavery; for the rest of them, the words are there mainly because they owned slaves.) After this crew, those two words largely disappear, with the exceptions of Fillmore (he had “moderate anti-slavery views”, according to the article) and Lincoln (for obvious reasons).

But the issue does not disappear. The words “slavery” or “antislavery” appear as MDWs for JQA, Jackson, Van Buren, Polk, Taylor, Fillmore, Pierce, and Buchanan, before coming to a close with Lincoln. That’s everyone between the Founding Fathers and the close of the Civil War with the exceptions of William Henry Harrison (who served one month) and John Tyler (who was in office, but didn’t exactly serve at all). Many of these Presidents were slave-owners themselves, but we see a shift away from personal ownership as the focus (with a few overlap cases), and toward the rise of a political cause—from slaves to slavery. It’s a striking lexical marker of the transition from one paradigm to another, maybe somehow indicating the point at which Wikipedia writers and readers feel that Presidents were “of their time” instead of responsible for it.

A Final Mystery

I want to end with something I noticed but can’t quite explain. The word “president” actually appears as an MDW in several cases. Here they are:

word frequency p value President
president 101 0.000131294 Tyler
president 102 0.001869553 Andrew Johnson
president 74 0.002524355 Taft
president 105 0.006078532 W
president 80 0.008887996 George HW Bush
president 52 0.00954079 WH Harrison
president 96 0.016850757 Nixon
president 86 0.018566542 Ford
president 98 0.038807297 Reagan

In some of these cases, it seems like the word might have to do with unique relationships to the office. Harrison died immediately, Tyler took over even though no one wanted him (he was known as “His Accidency“), while succession laws were still untested, and Johnson abused the office to veto Congress until they impeached him (note: if you include “presidential” in these results, you add Clinton to the mix, suggesting impeachment may play a role). Still, even if this is right, it only explains a few articles. I have no idea what any of this has to do with Taft.

And then there’s this: Every Republican President since 1968 has the word “president” as an MDW. What’s more, in this era it’s only Republicans—Carter, Clinton, and Obama are all missing from that list. Why is this happening? Is it some sort of conservative preference for hierarchy/authority? A right-wing love of the institution? The tendency of these Presidents to wield presidential authority in problematic ways (Watergate, the pardon of the guy who did Watergate, Iran-Contra, the Decider and his father)? Just a random tic from a prolific Wikipedia editor? (Even then, it might interesting that the editor of these articles has that tic.)

I looked at the word’s usage in the articles in hope of clarity, but the answer wasn’t immediately obvious. I did notice that, in the George W. Bush article, for instance, there was a tendency to call him “President Bush” in photo captions (which are included in the articles I analyzed)—but this doesn’t explain why other articles don’t follow the same practice. This all put me in mind of a bumper sticker I used to see in Texas, that looked roughly like this:


I never knew how to interpret it. What’s the point of stating that the current President is the President? I am being completely honest when I say that I don’t know if this is supposed to be combative, reassuring, snarky, patriotic, a sign of the tribe, or something else I haven’t even thought of. So it’s interesting to see a sort of version of it replicated in these MDWs—105 uses of the word President††† in an article that tells you, right at the top, that it’s about a President. It’s an interesting form of distinction for the modern Republican President—the simple confirmation that they held the job.



*It was very tempting to use this as the title of the post, but I think you just can’t do that anymore. If you Google “what we talk about when we talk about” -love (the last part is so that you don’t get any actual references to Raymond Carver’s short story), you get 211,000 results. Based on those results, here are a few of the things about which we talk about what we talk about when we talk about them:

  • Apple and Compelled Speech
  • Gun Violence
  • “The Uyghurs” (quotation marks in original)
  • Indicators
  • Clone Club
  • Causality
  • GIFs
  • God
  • Minimalism

** I doubt he wins though; his name is too weird. My guess is Ben Harrison.

***Specifically, I used word frequencies from all articles to set expected values, and word frequencies in given articles to set observed values. I then used a Fisher’s exact test to determine which words were significantly more present than expected. I did not look for words that were missing (e.g., if a President’s article says “war” much less than ordinary). My thanks to Mark Algee-Hewitt for helping me write the R code used in this project, and for explaining MDWs to me in the first place.

† In all cases, the words are ordered by p-value, where lower is taken to mean “more distinctive”. Here and below, I’m pasting in partial lists for space purposes.

†† This makes it longer than Macbeth, as well as 7 other Shakespeare plays. See also the 2,800 word “Assassination of John F. Kennedy in Popular Culture” article.

††† W’s article has 105 occurrences of the word “president”, more than three times as many as George Washington, who not only has a roughly equal-length article, but practically invented the office.

If I’m Right You Can Respond in Two Years

Here are two questions I recently realized I couldn’t answer:

  1. What counts as a successful article in my field (English)?
  2. How long does it take before people start citing a published article?

For the first question I’m really thinking about the number of citations an article has. There are other ways to measure success, but this is a big one—especially if, like me, you’d like to get hired somewhere someday—and I suspect that a lot of the other ones wind up correlating with this one anyway. But how many citations does a successful article have? 5? 50? 500? This varies widely by discipline, and I had no idea what the right answer was for English / literary criticism.

The second question is related, but mostly born out of morbid fascination with the glacial pace of knowledge sharing in my field. Obviously we talk to each other like normal people, so ideas get spread around through informal means as quickly as they do in any other walk of life, but our peer-reviewed publication process is notoriously slow. Unless you’re already a well-known scholar, the best timeline you can really hope for when you set out to publish an article is about a year from submission to print, and that’s if you write really fast and get accepted on your first try—it’s not unheard of for an article to exist for years before it finally shows up in a journal.

Given that pace, I wondered how long it takes for an article to start being cited by other scholars. If it takes a year to get published, does that mean it takes another year to get cited? Is the print version of the discipline effectively operating at a two-year lag relative to people’s ideas?

To test both questions, I did some quick and dirty data analysis. This is by no means conclusive of anything; but I think it tells us more than we (or at least I) knew before.


I took article titles from PMLA, arguably the flagship journal in the field, and definitely one of the most important journals, even if you prefer to put something else in the top slot. I used editions running from 2010 to the present, because I wanted to see what happens to an article early in its existence; also collecting the data was kind of time-consuming, so I wanted to keep it limited. I also only looked at what I took to be the main articles, so no notes from the editors, nothing organized under subsections like “Theories and Methodologies”, “Our Changing Discipline”, “Criticism in Translation”, “Little-Known Documents”, etc. Things under “Cluster on” whatever, or “Special Topics” I did use. Basically, if it looked like it was in the middle of the edition, I took it. It’s possible this skews the results somehow, but at the end of the day I just wanted a bunch of articles from this decade in a prominent journal, and I definitely got that—specifically, 152 of them. Still, it’s worth saying that this is not a comprehensive look at PMLA.*


I then used Google Scholar to figure out how many citations each article has so far. I’ve never verified the accuracy of Google’s numbers, but spot-checks have usually panned out, and I expect that they’re within acceptable range of the truth over this many articles. It’s possible that there are little errors here and there, as I logged the numbers by hand while listening to music, and was briefly kicked off Google Scholar because they suspected I was a robot.** But I think they’re accurate, and haven’t noticed any disparities so far.


It turns out that the answer to Question 2 has quite a substantial impact on the answer to Question 1, so let’s start by looking at the relationship between citations and the passage of time.

Figure 1


Here we’ve got circles representing articles at various citation levels; the size tells you how many articles there are at that level. So, for example, that big circle at 0 in 2015 is big because there are 24 articles published that year that have never been cited anywhere. Meanwhile one article from 2013 has been cited 39 times, the most of anything in my corpus. (The article is Valerie Traub’s “The New Unhistoricism in Queer Studies”.)

As you can see, there’s a strong correlation between year of publication and number of citations. If you just correlate Years with Total Citations, you get a coefficient of -.98 (the trend line above tells the same story). Here’s that data in a table:

Table 1

Year Total Citations Total Articles Citations per Article
2010 210 27 7.78
2011 162 24 6.75
2012 109 28 3.89
2013 70 18 3.89
2014 30 22 1.36
2015 5 28 0.18
2016 0 5 0.00
Total 586 152 3.41


So far we’ve only had one issue in 2016, but leaving it out actually raises the correlation coefficient between Years and Citations to -.99. This story does get a little more complicated if you break things out by issues of the journal, rather than lumping things together by year:

Figure 2


The basic trend holds, though the correlation coefficient decreases to -.83. This suggests that citations are not sensitive to time at the level of three months or particular issues, which sounds intuitively right to me; but none of these correlations are based on very long time periods, and the first few are based on very small data sets, so I wouldn’t read too much into them aside from the headline finding.


The Citation Time Lag (CTL) is quite powerful, and appears to exert a strong pressure against any citations within the first year of publication. The average number of citations for an article published in 2011 is higher than the total number of citations for all 28 articles published in 2015. Two years out, the situation is much less bleak: Of the 22 articles published in 2014, 18 have at least one citation. This might be a little hint in favor of the timeline I mentioned at the beginning of the post; that is, if it takes about a year to publish things, then we’re too early for 2015 articles to have put up much of a showing, and just right for 2014 articles to break out.

There are two questions about the CTL that this data does not satisfactorily answer. First, why the brief plateau in citations-per-article (C/A) in 2012-2013? The technical answer is that Traub’s 2013 article is such an outlier that it skews the whole year up; in a world of 1-5 citations, having 39 is huge. If you artificially lower the number to 25 (equivalent to the second-most successful article in the corpus), 2013 has 3.11 C/A, more in line with the rest of the trend. But to me this really reveals just how limited this corpus is; if one article can have such a strong effect, I’d really like to see more articles in the data to even things out. That’s a good reason to expand this research in the future.

Second, when does the CTL abate? Obviously the citations per article aren’t likely to increase at this rate forever. A random issue from 50 years ago may well contain no articles that are still cited. The superstar effect would be strong in older issues, too—the one article in a year so prominent that it has stood the test of time would skew things for its issue compared to the others. Of course I can’t answer this question based on this data; that’s another good reason to dig deeper.

Still, we have enough here to offer a provisional answer to Question 1. Fortunately for we humanists, the answer is, It depends. If your article is one year old and has no citations, you’re not a failure; you’re everyone. Articles that are two years old top out at 2-3 citations. After that, the sky’s the limit; Traub’s article is just three years old (though of course, the average across articles continues to rise with time). For articles published in the last five years, 40 citations is about as successful as you can be. Two other articles break 20 citations; the top 5% have at least 15 citations; the top 10% have at least 11.

This seems to be the order of magnitude for success within this time frame: an article with ten or more citations. The community of scholars in my field appears, at least in print, to evolve very slowly and to form relatively few connections. It’s a bittersweet pill for young scholars; on one hand, your ideas won’t be in this particular kind of circulation anytime soon. On the other hand, if you’re worried about lack of interest in something you’ve published, well, just check back in a few years—the peak of popularity is just a few like minds away.



* I did look at “Theories and Methodologies” (TM) articles for 2013. They averaged slightly fewer citations than articles I categorized as “main” (i.e., not in a subsection), although the main articles average was bolstered by Valerie Traub’s article’s 39 citations; aside from that the citation numbers were similar. Based on this limited sample, TM articles appear to be shorter and to cite fewer things themselves (i.e., their own bibliographies are shorter), but they also might be written by more prominent scholars. At least, I felt that I recognized a higher percentage of them off the bat. In theory this could give them a leg up as far as generating citations more quickly; that could be interesting to test further.

** I was not. Sources used during the collection process include Harvey, P.J., Let England Shake; and Simpson, Sturgill, A Sailor’s Guide to Earth

The Edible Ox

By way of introducing this blog, I thought I’d just explain the name. It comes from the hilarious and bizarre satirical novel The Good Soldier Švejk, written by the Czech author Jaroslav Hašek just after World War I. Hašek lived a very fast and chaotic life consisting largely of anarchism, alcoholism, vagrancy, literature, and the kind of lunatic life-consuming humor that makes you wonder exactly how in-on-the-joke the guy living it actually was.

At on point, because of his love for one Jarmila Mayerová, whose parents were respectable enough to be basically horrified by his interest in her, Hašek cleaned up his act a little, cranking out 64 stories in one year and securing a job at a journal called The Animal World. I have no idea what this journal ordinarily did—lists of animals?—but, in the words of Cecil Parrott, who edited the volume I have, Hašek “was soon dismissed for writing articles about non-existent animals which he had invented” (ix). The unforgivable sin at the animal magazine is inventing the animals. Pretty soon Hašek was back to vagrancy and other adventures, like selling dogs, faking his suicide, and founding a political party called “The Party of Moderate and Peaceful Progress Within the Limits of the Law” and which actually railed against the monarchy and prevailing political system. (As Parrott explains, “Of course it was only another hoax, designed partly to satisfy  Hašek’s innate thirst for exhibitionism and partly to bolster the finances of the pub where the election meetings were held” (x).)

The obvious question here is: What were those animals? I can’t find any information about Hašek’s inventions in the real magazine, but fortunately a character named Marek in The Good Soldier Švejk has an experience suspiciously similar to Hašek’s. These are the animals he invents:

  • The Sulphur-Bellied Whale, “the size of a cod” and “equipped with a bladder full of formic acid” which he can shoot at fish
  • The Artful Prosperian, “a mammal of the kangaroo family”
  • The Edible Ox, “the ancient prototype of the cow”
  • The Sepia Infusorian, “which I characterized as a sort of sewer rat”
  • The Faraway Bat, a “bat from Iceland”
  • The Irritable Bazouky Stag-Puss, a “domestic cat from the peak of Mount Kilimanjaro”
  • Engineer Khun’s Flea, found in amber and blind “because it lived on an underground prehistoric mole, which was also blind” (all from page 325)

Of these, I thought the ones that sounded most like a blog title were the Artful Prosperian, the Edible Ox, and the Faraway Bat. The first is probably the most apt, but it sounded too stuffy to me. I was worried people would assume it was a reference to an 18th-century satirical newspaper full of inscrutable jokes about Whigs (obviously a reference to a Czech satirical novel is completely different). The Faraway Bat is my favorite joke in the list, but it reminds me too much of Batman. But the Edible Ox has it all.

In practice most of the posts on this blog will probably be about literature, politics, basketball, etc., rather than nonexistent animals. But I’m hoping I can retain the Spirit of Marek:

‘I can say that I did my best and kept to my action programme for running the magazine as far as lay within my own powers. But I soon discovered that my articles went beyond my capabilities.
‘Wishing to offer the public something completely new I invented animals.’