Todd W. Schneider

Real-time NBA Championship Odds Before and After LeBron James’s Announcement

LeBron James announced that he’s going back to Cleveland, and immediately the Cavaliers’ chances of winning the 2014/15 NBA championship increased from 10% to around 18%.

Of course that 10% already had built in some likelihood that James would choose to play for the Cavaliers next season. Before Cleveland was considered a threat to land LeBron, their championship odds were around 2%, so the 10% Cleveland odds immediately before LeBron’s decision perhaps reflected market expectations that LeBron had a 50% chance of choosing Cleveland: 0.5 * 0.18 + 0.5 * 0.02 = 0.1

The Houston Rockets were initially the other big winners of The Decision Part II. Chris Bosh had been expected to join the Rockets if LeBron left Miami, and so the Rockets’ championship odds increased from 5% to around 15% immediately after LeBron’s announcement. Unfortunately for Houston, though, it later came out that Bosh was returning to the Miami Heat, and Houston’s championship odds subsequently declined back to around 6%

UPDATED: some folks have asked how the Indiana Pacers’ championship odds changed in the wake of Paul George’s serious leg injury. The Pacers had been around 4.4% to win the championship before George got hurt, but they’ve since declined to 2.5%:

What is the Longest Disambiguation Page on Wikipedia?

Have you ever found yourself looking up John Smith on Wikipedia, only to discover that there are 205 different John Smiths with Wikipedia pages? It’s a testament to the breadth of knowledge on Wikipedia, but it can also be kind of annoying: what if you just want to know the real deal about the English explorer John Smith’s encounter with Pocahontas?

I found myself in the above situation recently, and decided that it’d be interesting to know what is the longest disambiguation page on all of Wikipedia. John Smith has 205 entries, which seems like a lot, but maybe there are other generic terms that have even more Wikipedia entries?

John Smiths

Lots of John Smiths!

Luckily Wikipedia provides an alphabetical list of all ~250,000 disambiguation pages. I modified the Rap Genius Trackback Scraper to iterate through every disambiguation page, count up the number of list items in each page’s “may refer to” section, and store the results in a database.

Without further ado, the top 10 longest Wikipedia disambiguation pages:

  1. St. Mary’s Church - 584 “may refer to” links
  2. Communist Party - 569
  3. Aliabad - 520
  4. Hoseynabad - 501
  5. List of greatest hits albums - 415
  6. Hasanabad - 308
  7. Mohammadabad, Iran - 299
  8. First Lutheran Church - 279
  9. Socialist Party - 260
  10. Dehnow - 241

St. Mary’s Church is the most ambiguous term on Wikipedia, followed by Communist Party, and Aliabad, which is apparently a common Persian town name. Now if only we could get one of the many Communist Parties to hold a group meeting at a St. Mary’s Church in an Aliabad…

Other tidbits:

  • It’s a bit surprising to see so many Persian town names at the top of the list. Closer investigation reveals that a single Wikipedia user, Carlossuarez46, seems to have contributed most of the edits to those pages
  • William Smith just beats out John Smith as the most ambiguous person, by a score of 211 to 205
  • The top scientific term is the species abbreviation C. elegans, with 223 “may refer to” links
  • Church names are heavily represented. The longest St. [name] Church formulations are:
    1. Mary - 584
    2. John - 211
    3. Peter - 197
    4. George - 164
    5. Michael - 159
  • And the longest First [branch] Church formulations:
    1. Lutheran - 279
    2. Presbyterian - 230
    3. Baptist - 218
    4. Congregational - 94
    5. Church of Christ, Scientist - 70
  • The distribution of disambiguation pages shows a heavy right skew
    • Median length of 4 “may refer to” links
    • Mean length is 7.1
    • Most common length is 2
    • 25% of all disambiguation pages have length 2

distribution

Here’s a Google Spreadsheet with the top 1,000 longest pages, and you can download the full dataset as a .csv from GitHub

RailsConf: In Rodd We Trust

I gave a “Live Coding” talk at RailsConf; the video is now available in stores and on-demand. The talk was largely motivated by the work I did on Wedding Crunchers. I had to pull a few strings, but they finally spelled my name correctly.


Most of the talk focused on writing some code to do n-gram analysis (here’s the GitHub repo), but there were also a few fun new graphs that show the rise of tech and programming in New York Times wedding announcements.

The word programmer now appears more frequently than the word banker in New York Times wedding announcements, though to be fair that’s more a function of banker on the decline as opposed to programmer on the rise:

And Google has overtaken Goldman Sachs as a more commonly mentioned employer:

Remember you can do your own searches at WeddingCrunchers.com!