The MVP of Meeting MVPs

My last post, on the employment connections between Presidents, put me in mind of some network diagrams I once put together on the other major topic in American history: the NBA. Specifically, I was interested in MVPs who have played together on the same team in the same season. I didn’t care what stage of the career either guy was in; as long as both of them were ever on a team together and won the MVP sometime—even far in the past or future from the season they shared—they were still connected. Here’s the result:

NBA_MVP.png

The colors of the edges are based on the shared team (I approximated team colors, which I guess I’ll list in the footnote to this sentence), and their weight (line thickness) is based on how many years the two connected guys played together.1  The nodes are sized based on betweenness centrality. As you can see, by this one metric no one in NBA history has ever been as important as Bob McAdoo. A little more on him in a second.

The headline, I suppose, is that MVPs are fairly highly interconnected, even by this narrow criterion. In the NBA, if you get past first-order connections, you pretty quickly get to a six-degrees-of-Kevin-Bacon situation, as you can see for yourself using this tool from Slate.  For instance, you can get from Kyrie Irving to George Mikan in six teammates—impressive when you consider that Mikan’s career predates the NBA.2  But direct connections like you see here are much tougher; Kyrie will never play with most of the people in the league today, much less retirees. MVPs tend to have long careers—they’re generally highly employable—so that ups their odds of playing together, but they also often stick with a team (the Lakers weren’t about to trade Kobe Bryant), and their contracts are usually expensive, so I was a bit surprised to see how many had played together at least once. All told, of the 31 guys who have been MVP, only 9 never shared a roster with someone.

That said, some of these connections are ridiculous. Here are a few of the more absurd ones, in order of ascending tenuousness:

  • Steve Nash is only connected to Kobe because of the time he and Dwight Howard tried to form a super-team in LA. Nash was already 38 and in his 17th season, and he spent the most of the time injured. Dwight left the next year. The team was not super.3 
  • Karl Malone is only connected to anyone because of the time he and Gary Payton tried to form a super-team in LA. He was 40, and in his 19th season, and missed half the games because of injuries. And then the Pistons won the Finals anyway.
  •  Shaquille O’Neal concluded his career—largely spent on a super-team in LA—by roaming from team to team like a gigantic, increasingly ineffective samurai. He joined Steve Nash on the Phoenix Suns, where, according to distraught Wikipedia phrasing, he “all but ended their fast-paced offense which had brought them on the cusp of a Finals appearance”. Then he moved on to LeBron James’s Cavs, and helped them get slightly less far in the playoffs than they had the year before. Finally, he journeyed to the Celtics, where Kevin Garnett’s recently-built super-team was coming off a Finals appearance; they have not been back since.
  • Moses Malone similarly refused to retire. His connection to David Robinson stems from his 21st season, when he was 39. This was 1995, and he was the last active player from the ABA, which folded in 1976. He managed to play just 17 games, averaging about 9 minutes a game. But in his last game he hit an 80-foot buzzer-beating three, so it was probably worth it for everyone involved.
  • Bob Cousy did retire, but then he unretired, which was probably worth it for no one. In 1970 he was coach of the Cincinnati Royals, and decided to play himself to boost ticket sales. This happened even though: A) He had last played a game in 1963—long enough ago that his absence effectively coincides with the entire existence of the Beatles; B) He was 41, C) He was a point guard, and this team featured Oscar Robertson. In the 7 games he played, Cousy amassed 34 minutes and 5 points—not on average, but in total. And yet, this is still a more meaningful connection than:
  • That between Moses and Bob McAdoo on the 1977 Buffalo Braves. They were together for two games, during which Moses played 6 total minutes. His stat line: 0/1/0/0/0 with 1 foul. But there’s one thing you can’t deny: They were technically on the same team in the same season.

That’s the perfect transition back to McAdoo. What’s his deal? If you’re like me you know him mainly as a trivia answer, a guy who led the league in scoring and won an MVP in the mid-70’s for… some team (turns out it was the Braves). He was a great player, but I get the sense that he’s generally considered one of the weaker MVPs. In any case, he definitely moved around a lot after that successful early period in Buffalo. From there he went to New York for a while, then stopped by Boston for 20 games in 1979, just long enough to get a connection with Dave Cowens. After stints in Detroit and New Jersey, he actually stuck around with the Lakers for four years, so his connections to Magic Johnson and Kareem Abdul-Jabbar are pretty substantial. And then he retired like he worked: By first playing 29 games in Philadelphia with Charles Barkley and Dr. J. The network above doesn’t quite show it, but Moses was there too—coming full circle after their 6-minute connection on the Braves. Those two were real journeymen, but the crazy thing with McAdoo is that he was only 34 when he retired; he played just 14 seasons, but still got to 7 different teams.

One of the initial motivations behind this project was to show that the situation we’ve got on the Warriors next year—Kevin Durant and Steph Curry, two very recent MVPs, still in their prime—isn’t that unusual. But in spite of its fairly high connectivity, I think you really see just the opposite here. Only a few of these guys played together at our near their peak: Definitely Kareem/Magic and Shaq/Kobe, probably Cousy/Russell, and it’s reasonably close for Oscar/Kareem, Dr. J/Moses, and Duncan/Robinson. Everyone else is off somehow; in some cases both parties are past their prime. So that line between Durant and Curry should be pretty unique.

Still, I like all these tenuous connections. It’s the reason the two clusters are so connective, the top one connecting the 2000’s from Karl Malone to LeBron, and the bottom one stretching (thanks to Cousy’s marketing tactics and McAdoos’ travels) all the way from the 1950’s to Tim Duncan, hitting every major period in between.

Now that Duncan has retired, though, that cluster may be done. The last hope is probably Kawhi Leonard, who stands a decent chance of winning MVP sometime in the next few years. But then he or someone else will also need to migrate. Derrick Rose could be a key player here; often these guys start moving around when they’re a little worse, either because of age or, as with Rose, injury. Bill Walton is the best analogy there—a dominant force in 1977, he could’t stay healthy enough to keep going, missing multiple entire seasons. That’s probably the only reason he wound up on the Celtics, where, unlike the examples above, he was a major contributor to a title team while creating his connection to another MVP in Larry Bird. And finally, these connections often back-form; look at Moses and Robinson, or Cousy and Oscar. There’s still time for LeBron to have a late-career Shaqesque spirit journey.

One last thought: These sorts of networks get more connected very quickly if you add either a few extra nodes (more players = better odds that any one player has a teammate out there somewhere) or another principle of connection. I thought about using family members. Dell Curry, for instance, played with both Karl Malone (Jazz, ’86) and Hakeem Olajuwon (Raptors, ’02), connecting the two big clusters, and he’s the father of Steph Curry, connecting everyone to him and Durant. And the Warriors have also forged a connection to the most important man in the MVP network: In July, they re-signed back-up forward James Michael McAdoo, second cousin to Bob.

 


Notes

1. Celtic green = Celtics; Dark green = Bucks; Yellow = Lakers; Dark blue = 76ers; Red = Rockets; Gray = Spurs; Light blue = Mavericks; Pinkish purple = Suns. A few are easier to describe based on the connection: Cousy<->Robertson = Cincinnati Royals; Shaq<->James = Cavaliers; Moses<->McAdoo = Buffalo Braves. 

2. Kyrie played with Anthony Parker, who played with Rick Mahorn, who played with Wes Unseld, who played with Bob Ferry, who played with Slater Martin, who played with George Mikan. One cool thing about the Slate tool is that it incorporates other sports at the same time; for instance, LeBron James is evidently 4 degrees of separation from Mike Trout (via Damon Jones, Mark Hendrickson (who played in the NBA and MLB), and Scott Kazmir). One caveat, though, is that their data only goes to 2013.

3. This experiment and Shaq’s late-career wandering (see below) are the only reasons Dirk Nowitzki is connected to a larger network, instead of just to Nash. There’s a certain dignity to isolation in this network, I guess.

Nice Work If You Can Get It

One of the oldest battles in American political rhetoric is the one that pits bold outsiders against experienced statesmen. This election has taken that to such a ludicrous extreme that it put me in mind of a project I did back when I was first learning how to build network diagrams.1  The idea was to see Presidential employment relationships: Which Presidents held major jobs under other Presidents? Who employed the most other Presidents? The results tell us a little about the outsider/insider battle at the highest level of insiderness.

When you start to dig into this stuff, a lot of ambiguous situations arise. For instance, Ulysses S. Grant was Commanding General of the United States Army under both Abraham Lincoln and Andrew Johnson, but Johnson, characteristic of his usual interest in skilled governance, national unity, and racial progress, hated him and constantly tried to get him fired.2 Should that count? What about William McKinley, who was a major in the U.S. Army during the Civil War? Technically that means he worked for Lincoln and Grant—should that count? In the end I settled on an imperfect but easy compromise: I took any job that got its own category in the sidebar of the President’s Wikipedia page. Grant’s has his Commanding General post; McKinley’s major post doesn’t make the cut.

Here are the results:

PresidentialEmploymentSimpler

The nodes here are colored by political party and sized by betweenness centrality.3 I’ve arranged everything here to show the major clusters. What immediately stands out is that the early guys are incredibly interconnected. John Quincy Adams worked for four different Presidents (ambassador for Washington, Adams with no Q, and Madison, and Secretary of State for Monroe) and hired another, Harrison, who had also worked for his dad. Recently a lot of people, including Barack Obama, have said that Hillary Clinton is the most qualified person ever to run for President, and while I think the basic gist of this is true (she’s as qualified as anyone in the last hundred years), you just can’t beat those early guys. They just insisted on hiring each other to do everything (and that’s before you factor in things like writing the Constitution).

Beyond that, you see a couple of other groups: The Lincoln Republicans, a group I call the Immigration Era Republicans,4  and then the American Empire guys—the WWI and WWII Presidents, followed by the Republican group that dominated the rest of the 20th century. It’s obvious that some things are off here; W. is clearly in the same political club as Nixon (who contributed a lot of his staffers) and George H.W. Bush (who contributed a lot of his DNA, education, baseball teams, etc.). And there are other, slightly more tenuous connections as well: The Harrisons are related, albeit separated by a generation; JFK’s dad worked for FDR; Taylor prosecuted the Mexican-American War for Polk.

Here’s another issue with this data: You may have noticed that the edges in that network are multicolored. That’s to show the nature of the job held, as detailed in this key:

EdgeKey

Most of these are probably fine (and note that “governor” only refers to appointed governorships, like when McKinley made Taft Governor-General of the Philippines), but ambassadorships are doing a ton of work here.5  Buchanan, for instance, was the ambassador to Russia under Jackson at the early stage of his bafflingly long (considering how it ended) career in national politics, which is the only reason the President in the late 1850’s is connected to George Washington. Arguably these aren’t substantial enough roles to be included in this kind of graph; that’s what happens when you let Wikipedia make the decisions for you. Still, in broad strokes, I think this really shows you something about the internecine operations of power at our highest level, and its capacity to reset every so often.

One last image: Here’s everything laid out chronologically. This time the edges are directed, so you can see, based on the arrows, who hired whom.

PresidentialEmploymentOrdered

Here the unending nature of that first group really becomes clear. If you worked for George Washington, you stood a surprisingly good chance of being in the same org chart as the guy who would one day lose seven states to secession at the start of the Civil War. You also see that the groups overlap chronologically, with Wilson and FDR crossing the 1920’s Republicans, and the Taylor/Fillmore pair interrupting the Founders’ lovefest. You also get the weird anomaly of Hoover hiring a guy who had already been the President; when he needed a Chief Justice of the Supreme Court, who better to choose than the man appointed governor of a territory by the man whose Vice President later appointed that same man governor of a territory? (Taft was also a judge and solicitor general under Ben Harrison—you just couldn’t keep Presidents from hiring him, even decades after he was done being President.)

The possibilities for describing these employer-employee chains are pretty fun. For instance, Wilson’s Assistant Secretary of the Navy’s Vice President’s general’s Vice President’s Vice President’s CIA Director’s running mate was Ronald Reagan (that’s Wilson-FDR-Truman-Eisenhower-Nixon-Ford-Bush-Reagan). Or, much weirder, Polk’s Secretary of State’s former boss’s former boss’s former boss’s Secretary of State’s UK ambassador’s former boss’s Vice President’s appointed governor’s Vice President was John Tyler, aka the guy Polk replaced. (That one goes Polk-Buchanan-Jackson-Monroe-Jefferson-Madison-JQA-Washington-Adams-Harrison-Tyler.)

In the recent past, we’ve had a lot more isolates than before, although, as noted, there’s a strong argument for connecting W. to the other American Empire guys. But if Clinton wins, we’ll have connections to Obama (who hired her as Secretary of State) and arguably Bill Clinton (it’s pretty odd to think of that as an employment relationship, but First Lady makes the Wikipedia sidebar—nothing I can do!). And if you’re willing to go along with all that, the only guy who would be left out of the loops in the past 120 years is Jimmy Carter, a mediocre President but arguably in the top three in terms of being a decent human being. It’s a little sad to think of him out there by himself; I think Clinton should appoint him Ambassador to Cuba for a couple days.6  It’s what the Founders would have wanted.

 

 


Notes

1. I did all of this with Gephi.

2. One strategy was to try and promote William T. Sherman ahead of Grant to dilute his power. For some reason Sherman preferred to side with Grant, which led to the odd situation of Sherman calling in political favors to battle his own promotion on the Senate floor. See Jean Edward Smith’s Grant, 452. 

3. Betweenness centrality basically measures how important a node is for connecting other groups of nodes to each other; so Jackson is big because he connects all those dark blues to the Founders. The parties here include Democrat (dark blue), Republican (red), Democratic-Republican (light blue), Federalist (yellow), Whig (green), and none (white).

4. Two reasons: 1. There’s not another good name for the period from the 1880’s-1930’s; it’s post-Reconstruction, much longer than the Gilded Age or Progressive Era, and doesn’t align well with any wars. But, 2. Tons of people immigrated to the U.S. over this period. The numbers really explode starting in the 1880’s (they double the 1870’s in the source in that link) and stay strong until the mid-1930’s.

5. In the old days they seemed to call ambassadors “ministers” (e.g., Buchanan was United States Minister to Russia). I’m assuming these jobs are close enough to the same thing for my purposes, though I’d be interested to hear if I’m wrong about that. 

6. First Provisional Governor of Cuba for the U.S.: William Howard Taft. Of course. And by the way, to answer the two questions I asked in the first paragraph and then forgot about: 25 Presidents worked for some other President; JQA and Taft each worked for 4 different Presidents, tying for first on that metric. Three Presidents hired other Presidents 4 times: Jackson hired Buchanan, and then Van Buren for three different things. Madison hired JQA and Monroe for two things apiece. And Washington hired Adams, Jefferson, Monroe, and JQA, one time apiece.

The President Was Here

This post uses Most Distinctive Words to analyze what we talk about when we talk about Presidents.*

WikiPresidentia

I begin with the Wikipedia pages for each U.S. President. I downloaded these in January and then got distracted with work, so they’re a few months out of date, but still relatively fresh compared to most of the texts I work on. I wasn’t too strict about what I took; basically I started at the top of the article and stopped when I felt the article was over. Just having this much gives you access to an underrated form of quantitative textual analysis: checking how long things are. Here are the word counts for each President’s article:

President Word Count
LBJ 18485
JFK 17098
Ike 16458
FDR 16334
Lincoln 15765
Reagan 15374
Wilson 15234
Harding 15220
Grant 15107
Teddy 14868
Nixon 14366
W 14200
Washington 13809
Andrew Johnson 13674
McKinley 12988
Ford 12764
Jackson 12007
Carter 11958
Tyler 11944
Truman 11905
Jefferson 11643
Garfield 11555
Pierce 11537
Clinton 11497
Obama 11437
Hoover 11420
Madison 11008
Adams 10836
George H.W. Bush 10832
Cleveland 10060
Taft 9512
Coolidge 9239
Arthur 9162
JQA 8917
Hayes 8906
Ben Harrison 8423
Buchanan 7035
Van Buren 6966
Monroe 6801
WHH 6714
Taylor 6194
Polk 6096
Fillmore 4774

To me this variation appears to have barely any rhyme or reason. LBJ is a solid contender for the top spot; his Presidency is very tough to rank, because it includes both an incredible domestic agenda (Civil Rights Act, Medicare) and arguably the worst foreign policy agenda (Vietnam). But if you take the “absolute value” of everything he did, there’s no denying he’s one of the most consequential Presidents. Fillmore is also a decent contender for last place, with less than a fourth of LBJ’s word count; I think he’s probably high in the running for “most forgotten President”.** But in between, things quickly get strange. Eisenhower ahead of 4-termer FDR? John Tyler ahead of Thomas Jefferson? Harding ahead of Teddy Roosevelt? Monroe near the bottom?

The big lesson here is that these pages are pretty weird artifacts. Their authors will have stylistic tics (maybe Tyler got a verbose guy, and Monroe got an Imagiste), and editorial decisions might displace whole sections into other articles. For example, in Jefferson’s article, the Louisiana Purchase gets about 250 words, but there’s also a standalone article about the Louisiana Purchase that’s about 5,000 words long—i.e., more worthy of discussion than the entire administration and life of Millard Fillmore, according to random Wikipedia editors.

Most Distinctive Words

Still, even with these idiosyncrasies, we ought to be able to extract something interesting from the language of these articles. For instance, which Presidents’ write-ups have the most to do with slavery, or war? What are the most remarked-upon aspects of, say, Teddy’s life, or the founding fathers, or the Gilded Age? What words, if any, set apart the discourse surrounding an icon like Lincoln from that around a tremendous moral failure like Andrew Jackson?

To explore these questions I turned to Most Distinctive Words (MDWs). This is basically a measure of the words that appear more frequently in a given text than we would expect, based on their frequency in some comparison corpus. In my case, that means checking which words appear disproportionately often in one guy’s article, compared to what we’d see if the words were distributed evenly across all articles.*** So, for instance, we might expect to see “atomic” appear distinctively often for Truman, since he dropped more atom bombs than anyone else—and, in fact, “atomic” is a distinctive word for him (though “bombing” gets you Reagan and LBJ as well).

A few notes about the MDWs you’ll see in the rest of this post: To make life easier, I converted everything to lowercase (that way “train” and “Train” aren’t different words, just because one appears at the beginning of a sentence). I also removed stop words (things like “the” and “of”, which are so frequent that they can skew things, and also are often boring), numbers, and symbols. Finally, I took out the ordinarily used names of Presidents (so, “andrew”, “jackson”, and “jacksons”, the latter to catch possessives), because otherwise they dominate the data, since they are naturally very distinctive of their articles.

The System Works

When you check the MDWs for a particular guy, you usually find a pretty nice encapsulation of his Presidency’s Greatest Hits. Here are the top few for Lincoln:†

Lincoln MDWs
slavery
union
illinois
emancipation
confederate
kentucky
proclamation
douglas
war
mcclellan
land
booth
salem
springfield
free
slave
gettysburg
republicanism

You start with his two signature issues, pick up his home states, roll through his political acts and opponents, and even capture his assassin and, three cells later, one after the other, the reason he was killed. Another good example is Andrew Jackson:

Andrew Jackson MDWs
carolina
creek
rachel
tennessee
hermitage
indian
indians
orleans
south
calhoun
lands
removal
bank
banks
seminole
tribes

You’ve got his famous battle (“orleans”), his refusal to understand finance (“banks”), and his penchant for genocide—rendered all the more striking when you realize that “creek” refers to the Creek tribe (now called Muscogee), who lost a brutal war against Jackson and years later were also victims of the Indian Removal Act.

Since the MDWs work pretty often, it’s pretty striking when they depart from expectations. For some guys, this means a focus on the pre-Presidency—Madison’s top word is “constitution”, Reagan’s are littered with California and Hollywood terms, and Eisenhower’s focus on war terminology for eight straight words until they arrive at “interstate”, before jumping back to “ii”. Ulysses S. Grant is similar—unsurprising, since his own memoir barely mentions that he was President.

In another case that surprised me a little, the focus is on the post-Presidency:

William Howard Taft MDWs
court
justice
chief
v
supreme
opinion

Taft was the only President who ever went on to become a Supreme Court justice. That’s distinguishing in either sense of the word, and a nice legacy for a guy whose is probably best known to the public for being too fat to get out of a bathtub. (The article I have says that the evidence for this actually happening is unclear, but gives two sources for the distressingly ambiguous sentence “However, he once did overflow a bathtub.” I’m surprised and a little disappointed to say this whole sequence has been removed from the current version of the article.)

Another guy who surprised me was JFK. The word “assassination” is just 12th on his list; but on reflection, this may have something to do with the 8,000 word separate article on it, not to be confused with the 19,000 wordJohn F. Kennedy assassination conspiracy theories” article, which is longer than any Presidential article.††

Rules of Distinction

One feature of MDWs is that they privilege proper nouns. This makes sense when you consider just how specific (i.e., distinct) proper nouns are: all sorts of kids have dogs, but only Oblio has Arrow. This means there are a few things that define you if you get a Wikipedia page:

  • Your home. A President’s home state usually appears in his top few MDWs. If a guy has two home states, they both appear: Lincoln gets Illinois and Kentucky, Obama gets Illinois and Hawaii (and, even higher, Chicago). This isn’t a universal rule (JFK doesn’t have “massachusetts”), but it’s quite common.
  • Your wife. George has Martha, John has Abigail, Abe has Mary, Rutherford has Lucy, Herbert has Lou, Dwight has Mamie, Dick has Pat, Ron has Nancy, Bill has Hillary. You’re known by the person you love. But, there’s also:
  • You enemy. The first word for Washington is “british”; “confederate” makes the top five for Lincoln and Grant; Polk has his “mexico” and Truman his “korea”. Booth, Guiteau, Czolgosz, and Oswald make their expected lists. LBJ has not just “vietnam” but “goldwater”. And look back at the Jackson list above: creek, indian, indians, calhoun, bank, banks, seminole, tribes—that’s eight enemies in just 16 words (and another, “orleans”, is the site of a battle). For everyone, but especially for bloodthirsty maniacs, distinction is conferred by who and what we choose to fight.

Eras, In So Many Words

Another cool option with these MDWs is approaching from the other direction. Once we have them, we can pick a word and see who it encompasses. For instance, take the word “gold”. This turns out to be an MDW for Grant, Hayes, Garfield, Cleveland, Harrison, and McKinley—in other words, every President but one (Arthur) from 1868-1901. This is probably a function of the currency debates that dominated that era (the last three guys also have “silver” as an MDW), but it’s also a nice, very literal way to capture the Gilded Age.

Or take another definitive American word: “slave”. That word and “slaves” appear as MDWs for Washington, Jefferson, Madison, Monroe, John Quincy Adams, and Jackson—six of the first seven Presidents, and all of the ones who owned slaves themselves. (JQA, like his father, didn’t own any slaves, and the two words appear in his article in the context of his fierce opposition to slavery; for the rest of them, the words are there mainly because they owned slaves.) After this crew, those two words largely disappear, with the exceptions of Fillmore (he had “moderate anti-slavery views”, according to the article) and Lincoln (for obvious reasons).

But the issue does not disappear. The words “slavery” or “antislavery” appear as MDWs for JQA, Jackson, Van Buren, Polk, Taylor, Fillmore, Pierce, and Buchanan, before coming to a close with Lincoln. That’s everyone between the Founding Fathers and the close of the Civil War with the exceptions of William Henry Harrison (who served one month) and John Tyler (who was in office, but didn’t exactly serve at all). Many of these Presidents were slave-owners themselves, but we see a shift away from personal ownership as the focus (with a few overlap cases), and toward the rise of a political cause—from slaves to slavery. It’s a striking lexical marker of the transition from one paradigm to another, maybe somehow indicating the point at which Wikipedia writers and readers feel that Presidents were “of their time” instead of responsible for it.

A Final Mystery

I want to end with something I noticed but can’t quite explain. The word “president” actually appears as an MDW in several cases. Here they are:

word frequency p value President
president 101 0.000131294 Tyler
president 102 0.001869553 Andrew Johnson
president 74 0.002524355 Taft
president 105 0.006078532 W
president 80 0.008887996 George HW Bush
president 52 0.00954079 WH Harrison
president 96 0.016850757 Nixon
president 86 0.018566542 Ford
president 98 0.038807297 Reagan

In some of these cases, it seems like the word might have to do with unique relationships to the office. Harrison died immediately, Tyler took over even though no one wanted him (he was known as “His Accidency“), while succession laws were still untested, and Johnson abused the office to veto Congress until they impeached him (note: if you include “presidential” in these results, you add Clinton to the mix, suggesting impeachment may play a role). Still, even if this is right, it only explains a few articles. I have no idea what any of this has to do with Taft.

And then there’s this: Every Republican President since 1968 has the word “president” as an MDW. What’s more, in this era it’s only Republicans—Carter, Clinton, and Obama are all missing from that list. Why is this happening? Is it some sort of conservative preference for hierarchy/authority? A right-wing love of the institution? The tendency of these Presidents to wield presidential authority in problematic ways (Watergate, the pardon of the guy who did Watergate, Iran-Contra, the Decider and his father)? Just a random tic from a prolific Wikipedia editor? (Even then, it might interesting that the editor of these articles has that tic.)

I looked at the word’s usage in the articles in hope of clarity, but the answer wasn’t immediately obvious. I did notice that, in the George W. Bush article, for instance, there was a tendency to call him “President Bush” in photo captions (which are included in the articles I analyzed)—but this doesn’t explain why other articles don’t follow the same practice. This all put me in mind of a bumper sticker I used to see in Texas, that looked roughly like this:

WthePresident

I never knew how to interpret it. What’s the point of stating that the current President is the President? I am being completely honest when I say that I don’t know if this is supposed to be combative, reassuring, snarky, patriotic, a sign of the tribe, or something else I haven’t even thought of. So it’s interesting to see a sort of version of it replicated in these MDWs—105 uses of the word President††† in an article that tells you, right at the top, that it’s about a President. It’s an interesting form of distinction for the modern Republican President—the simple confirmation that they held the job.

 


Notes

*It was very tempting to use this as the title of the post, but I think you just can’t do that anymore. If you Google “what we talk about when we talk about” -love (the last part is so that you don’t get any actual references to Raymond Carver’s short story), you get 211,000 results. Based on those results, here are a few of the things about which we talk about what we talk about when we talk about them:

  • Apple and Compelled Speech
  • Gun Violence
  • “The Uyghurs” (quotation marks in original)
  • Indicators
  • Clone Club
  • Causality
  • GIFs
  • God
  • Minimalism

** I doubt he wins though; his name is too weird. My guess is Ben Harrison.

***Specifically, I used word frequencies from all articles to set expected values, and word frequencies in given articles to set observed values. I then used a Fisher’s exact test to determine which words were significantly more present than expected. I did not look for words that were missing (e.g., if a President’s article says “war” much less than ordinary). My thanks to Mark Algee-Hewitt for helping me write the R code used in this project, and for explaining MDWs to me in the first place.

† In all cases, the words are ordered by p-value, where lower is taken to mean “more distinctive”. Here and below, I’m pasting in partial lists for space purposes.

†† This makes it longer than Macbeth, as well as 7 other Shakespeare plays. See also the 2,800 word “Assassination of John F. Kennedy in Popular Culture” article.

††† W’s article has 105 occurrences of the word “president”, more than three times as many as George Washington, who not only has a roughly equal-length article, but practically invented the office.

If I’m Right You Can Respond in Two Years

Here are two questions I recently realized I couldn’t answer:

  1. What counts as a successful article in my field (English)?
  2. How long does it take before people start citing a published article?

For the first question I’m really thinking about the number of citations an article has. There are other ways to measure success, but this is a big one—especially if, like me, you’d like to get hired somewhere someday—and I suspect that a lot of the other ones wind up correlating with this one anyway. But how many citations does a successful article have? 5? 50? 500? This varies widely by discipline, and I had no idea what the right answer was for English / literary criticism.

The second question is related, but mostly born out of morbid fascination with the glacial pace of knowledge sharing in my field. Obviously we talk to each other like normal people, so ideas get spread around through informal means as quickly as they do in any other walk of life, but our peer-reviewed publication process is notoriously slow. Unless you’re already a well-known scholar, the best timeline you can really hope for when you set out to publish an article is about a year from submission to print, and that’s if you write really fast and get accepted on your first try—it’s not unheard of for an article to exist for years before it finally shows up in a journal.

Given that pace, I wondered how long it takes for an article to start being cited by other scholars. If it takes a year to get published, does that mean it takes another year to get cited? Is the print version of the discipline effectively operating at a two-year lag relative to people’s ideas?

To test both questions, I did some quick and dirty data analysis. This is by no means conclusive of anything; but I think it tells us more than we (or at least I) knew before.

Corpus:

I took article titles from PMLA, arguably the flagship journal in the field, and definitely one of the most important journals, even if you prefer to put something else in the top slot. I used editions running from 2010 to the present, because I wanted to see what happens to an article early in its existence; also collecting the data was kind of time-consuming, so I wanted to keep it limited. I also only looked at what I took to be the main articles, so no notes from the editors, nothing organized under subsections like “Theories and Methodologies”, “Our Changing Discipline”, “Criticism in Translation”, “Little-Known Documents”, etc. Things under “Cluster on” whatever, or “Special Topics” I did use. Basically, if it looked like it was in the middle of the edition, I took it. It’s possible this skews the results somehow, but at the end of the day I just wanted a bunch of articles from this decade in a prominent journal, and I definitely got that—specifically, 152 of them. Still, it’s worth saying that this is not a comprehensive look at PMLA.*

Method:

I then used Google Scholar to figure out how many citations each article has so far. I’ve never verified the accuracy of Google’s numbers, but spot-checks have usually panned out, and I expect that they’re within acceptable range of the truth over this many articles. It’s possible that there are little errors here and there, as I logged the numbers by hand while listening to music, and was briefly kicked off Google Scholar because they suspected I was a robot.** But I think they’re accurate, and haven’t noticed any disparities so far.

Results:

It turns out that the answer to Question 2 has quite a substantial impact on the answer to Question 1, so let’s start by looking at the relationship between citations and the passage of time.

Figure 1

CitationsPerYearLK

Here we’ve got circles representing articles at various citation levels; the size tells you how many articles there are at that level. So, for example, that big circle at 0 in 2015 is big because there are 24 articles published that year that have never been cited anywhere. Meanwhile one article from 2013 has been cited 39 times, the most of anything in my corpus. (The article is Valerie Traub’s “The New Unhistoricism in Queer Studies”.)

As you can see, there’s a strong correlation between year of publication and number of citations. If you just correlate Years with Total Citations, you get a coefficient of -.98 (the trend line above tells the same story). Here’s that data in a table:

Table 1

Year Total Citations Total Articles Citations per Article
2010 210 27 7.78
2011 162 24 6.75
2012 109 28 3.89
2013 70 18 3.89
2014 30 22 1.36
2015 5 28 0.18
2016 0 5 0.00
Total 586 152 3.41

 

So far we’ve only had one issue in 2016, but leaving it out actually raises the correlation coefficient between Years and Citations to -.99. This story does get a little more complicated if you break things out by issues of the journal, rather than lumping things together by year:

Figure 2

CitationsPerIssueLK

The basic trend holds, though the correlation coefficient decreases to -.83. This suggests that citations are not sensitive to time at the level of three months or particular issues, which sounds intuitively right to me; but none of these correlations are based on very long time periods, and the first few are based on very small data sets, so I wouldn’t read too much into them aside from the headline finding.

Analysis:

The Citation Time Lag (CTL) is quite powerful, and appears to exert a strong pressure against any citations within the first year of publication. The average number of citations for an article published in 2011 is higher than the total number of citations for all 28 articles published in 2015. Two years out, the situation is much less bleak: Of the 22 articles published in 2014, 18 have at least one citation. This might be a little hint in favor of the timeline I mentioned at the beginning of the post; that is, if it takes about a year to publish things, then we’re too early for 2015 articles to have put up much of a showing, and just right for 2014 articles to break out.

There are two questions about the CTL that this data does not satisfactorily answer. First, why the brief plateau in citations-per-article (C/A) in 2012-2013? The technical answer is that Traub’s 2013 article is such an outlier that it skews the whole year up; in a world of 1-5 citations, having 39 is huge. If you artificially lower the number to 25 (equivalent to the second-most successful article in the corpus), 2013 has 3.11 C/A, more in line with the rest of the trend. But to me this really reveals just how limited this corpus is; if one article can have such a strong effect, I’d really like to see more articles in the data to even things out. That’s a good reason to expand this research in the future.

Second, when does the CTL abate? Obviously the citations per article aren’t likely to increase at this rate forever. A random issue from 50 years ago may well contain no articles that are still cited. The superstar effect would be strong in older issues, too—the one article in a year so prominent that it has stood the test of time would skew things for its issue compared to the others. Of course I can’t answer this question based on this data; that’s another good reason to dig deeper.

Still, we have enough here to offer a provisional answer to Question 1. Fortunately for we humanists, the answer is, It depends. If your article is one year old and has no citations, you’re not a failure; you’re everyone. Articles that are two years old top out at 2-3 citations. After that, the sky’s the limit; Traub’s article is just three years old (though of course, the average across articles continues to rise with time). For articles published in the last five years, 40 citations is about as successful as you can be. Two other articles break 20 citations; the top 5% have at least 15 citations; the top 10% have at least 11.

This seems to be the order of magnitude for success within this time frame: an article with ten or more citations. The community of scholars in my field appears, at least in print, to evolve very slowly and to form relatively few connections. It’s a bittersweet pill for young scholars; on one hand, your ideas won’t be in this particular kind of circulation anytime soon. On the other hand, if you’re worried about lack of interest in something you’ve published, well, just check back in a few years—the peak of popularity is just a few like minds away.

 


Notes:

* I did look at “Theories and Methodologies” (TM) articles for 2013. They averaged slightly fewer citations than articles I categorized as “main” (i.e., not in a subsection), although the main articles average was bolstered by Valerie Traub’s article’s 39 citations; aside from that the citation numbers were similar. Based on this limited sample, TM articles appear to be shorter and to cite fewer things themselves (i.e., their own bibliographies are shorter), but they also might be written by more prominent scholars. At least, I felt that I recognized a higher percentage of them off the bat. In theory this could give them a leg up as far as generating citations more quickly; that could be interesting to test further.

** I was not. Sources used during the collection process include Harvey, P.J., Let England Shake; and Simpson, Sturgill, A Sailor’s Guide to Earth