Have you ever found yourself looking up John Smith on Wikipedia, only to discover that there are 205 different John Smiths with Wikipedia pages? It’s a testament to the breadth of knowledge on Wikipedia, but it can also be kind of annoying: what if you just want to know the real deal about the English explorer John Smith’s encounter with Pocahontas?
I found myself in the above situation recently, and decided that it’d be interesting to know what is the longest disambiguation page on all of Wikipedia. John Smith has 205 entries, which seems like a lot, but maybe there are other generic terms that have even more Wikipedia entries?
Lots of John Smiths!
Luckily Wikipedia provides an alphabetical list of all ~250,000 disambiguation pages. I modified the Rap Genius Trackback Scraper to iterate through every disambiguation page, count up the number of list items in each page’s “may refer to” section, and store the results in a database.
Without further ado, the top 10 longest Wikipedia disambiguation pages:
- St. Mary’s Church - 584 “may refer to” links
- Communist Party - 569
- Aliabad - 520
- Hoseynabad - 501
- List of greatest hits albums - 415
- Hasanabad - 308
- Mohammadabad, Iran - 299
- First Lutheran Church - 279
- Socialist Party - 260
- Dehnow - 241
St. Mary’s Church is the most ambiguous term on Wikipedia, followed by Communist Party, and Aliabad, which is apparently a common Persian town name. Now if only we could get one of the many Communist Parties to hold a group meeting at a St. Mary’s Church in an Aliabad…
Other tidbits:
- It’s a bit surprising to see so many Persian town names at the top of the list. Closer investigation reveals that a single Wikipedia user, Carlossuarez46, seems to have contributed most of the edits to those pages
- William Smith just beats out John Smith as the most ambiguous person, by a score of 211 to 205
- The top scientific term is the species abbreviation C. elegans, with 223 “may refer to” links
-
Church names are heavily represented. The longest St. [name] Church formulations are:
- Mary - 584
- John - 211
- Peter - 197
- George - 164
- Michael - 159
- And the longest First [branch] Church formulations:
- Lutheran - 279
- Presbyterian - 230
- Baptist - 218
- Congregational - 94
- Church of Christ, Scientist - 70
- The distribution of disambiguation pages shows a heavy right skew
- Median length of 4 “may refer to” links
- Mean length is 7.1
- Most common length is 2
- 25% of all disambiguation pages have length 2
Here’s a Google Spreadsheet with the top 1,000 longest pages, and you can download the full dataset as a .csv from GitHub