Search what’s been popular on the HN front page since 2007. Supports words, phrases, domains, and usernames. See below for more info.
Searches are performed against a database of Hacker News items dating back to October 2006. The dataset updates nightly with the latest front page items.
Code is available here on GitHub.
The intent is to search only items that appeared on the front page of HN, with the important caveat that HN only provides the exact list of front page items for dates since November 11, 2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:
But it should be a decent approximation, and the code could be modified to use other heuristics. It would probably be an improvement to fetch and include all job posts from pre 11/11/14 via the HN API.
The app allows searching by title, domain (with or without subdomain), and username. For a given search, the y-axis can display the percentage or number of all front page items that match the search term, the cumulative score of matching front page items, or the percentage of total front page score that the matching items represent.
When searching by title, there are 3 search styles available:
Web search uses PostgreSQL full text search, specifically the websearch_to_tsquery()
function. It supports a few operators: "quoted phrases"
, OR
, and -
. The easiest way to explain them is with examples:
machine learning
- titles that include the word “machine” and the word “learning”
"machine learning"
- titles that include the phrase “machine learning”
machine -learning
- titles that include the word “machine” but not the word “learning”
"machine learning" or ML
- titles that include the phrase “machine learning” or the word “ML”
Titles are converted to tsvector
using PostgreSQL’s built-in simple
text search configuration. I experimented with the english
text search configuration, but found that the stemming and stopwords sometimes interfered with proper nouns that appear in HN titles. The simple
configuration does something closer to an exact text match, so remember to use the OR
operator to search the singular and plural of a word, e.g. neural network or neural networks
.
Web search is always case insensitive.
Exact match, case insensitive uses a PostgreSQL regular expression to match the contents of each search term within word boundaries:
title ~* ('\y' || search_term || '\y')
Note that this makes use of a trigram index, as opposed to full text search, which uses a full text GIN index.
Exact match, case sensitive is the same as #2 above, but uses the ~
operator instead of the ~*
operator.