Todd W. Schneider

The TechCrunch Bubble Index: Parsing Headlines to Quantify Startup Hype

Latest TCBI data

TCBI 1 mo 1 yr

TechCrunch has established itself as a leading resource for startup-related news, so I thought it would be fun to analyze every TechCrunch headline to see what we might learn about the startup funding environment over the past few years. Without further ado, I present the TechCrunch Bubble Index, or as I like to call it, the TCBI:

What is the TCBI?

The TCBI measures the number of headlines on TechCrunch over the past 90 days that specifically relate to startups raising money. I defined a “startup fundraise” as one where the amount raised was at least $100,000 and less than $150 million. A higher TCBI means more TechCrunch stories about startups raising money, which might broadly indicate a vibrant fundraising environment. For example, a TCBI of 209 on November 16, 2014, means that there were 209 TechCrunch headlines about startup fundraises between August 19 and November 16, or 2.3 per day.

The data

I wrote a basic scraper to grab every TechCrunch headline dating back to mid-2005, then wrote a series of somewhat convoluted regular expressions to extract relevant information from each headline: was the story about a fundraise? If so, how much was raised? Is the company filing for an IPO, acquiring another company, or maybe shutting down entirely? The scraper parses TechCrunch’s RSS feed every hour, so the above graph should continue to update even after I’ve published this post. As of November 2014, there were about 135,000 articles total, just over 5,000 of which were about startup fundraises. The code is available on GitHub.


The TCBI’s list of caveats is longer than the list of Ashton Kutcher’s seed investments (42 TechCrunch headlines mention him by name), but nevertheless it’s still interesting to look at some trends. Nobody will be surprised to learn that the number of TechCrunch headlines about startups raising money has broadly increased since 2006:


There’s at least one fairly obvious followup question, though: how has the total number of TechCrunch articles changed over that time period? It turns out that the rate of total TechCrunch stories published per 90-day window has actually declined since 2011:


TechCrunch posts about more than just fundraises, but we can use these two graphs together to calculate the percentage of all TechCrunch stories that relate to startups raising money. That percentage was as low as 1% in 2009, but increased to as high as 9% before settling down to around 7% today:


Just because TechCrunch is posting more stories about fundraises, both in total and as a percentage, doesn’t mean that the startup funding environment is necessarily more favorable. It might well be that TechCrunch’s editorial staff has determined that fundraising stories generate the most traffic, and so over time they’ve started covering a larger swath of the fundraising landscape.

I don’t know anything about TechCrunch’s traffic data, but dollars to donuts I’d bet that fundraising stories get good traffic numbers, and the larger the amount raised, the more pageviews. I think back to Martin Scorsese’s character from Quiz Show when he explains the popularity of rigged game shows:

See, the audience didn’t tune in to watch some amazing display of intellectual ability. They just wanted to watch the money

Speaking of money, although the TCBI is based on the number of fundraises, we can also look at the total amount raised:

dollars raised

So wait, did the funding bubble burst in the spring of 2014?

In the spring of 2014, investors pumped more than $5 billion into startups (as reported on TechCrunch) over a 90 day period. More recently, in the fall of 2014, that number has declined by almost 40%, to just over $3 bn. The earlier TCBI graph showed a similar decline, from a high of 346 in April 2014 to a value of 209 as I write this. In fact, the TCBI is now at its lowest value since June 2012, and the percentage of all TechCrunch articles that are about startups raising money has declined from 9% to 7% in 2014 alone.

That doesn’t necessarily mean that it’s harder for startups to raise money today than it was six months ago. It could be that TechCrunch has consciously decided to report on fewer fundraises, though my uninformed guess is that’s not true. It could be that more startups raise in “stealth mode” without announcing to the press, which would cause the TCBI to decline. It’s also possible that it is simply getting harder to raise money!

I bucketed each fundraise article based on the amount raised to see if there are any trends within investment rounds (seed, series A, etc.):


All of the buckets are down from their peaks, but the bucket between $2 million and $10 million, which roughly corresponds to series A rounds, has shown the smallest decline relative to the other buckets.

Of course, raising money isn’t the only thing that matters to startups, even in the salacious world of the tech media. We can take a look at the number of TechCrunch stories about acquisitions, which shows a fairly similar pattern to the TCBI, peaking in early 2014 and declining a bit since then:


And on a more somber note, TechCrunch posts the occasional story about a company shutting down, though there are far fewer of those, at least for now:


“It’s like TechCrunch for Chimpanzees”

You know you’ve made it in the tech world when people start calling other startups “the [your startup] for [plural noun]”. TechCrunch certainly contributes to this trend, and I couldn’t resist parsing out some X for Y formulations to find common values of X and Y. The most common pairing was “Instagram for Video”, with a total of eight headlines, followed by “Netflix for Books” and “Pinterest for Men”, with three apiece. Here are some other good ones:

Airbnb for 27 things

Airbnb for Dogs, Airbnb for Creative Work and Meeting Spaces, Airbnb for Storage, Airbnb for Women’s Closets, Airbnb for Elite Universities, Airbnb for Storage, Airbnb for Pets, Airbnb for Home-Cooked Meals, Airbnb for The 1%, Airbnb for Private Jets, Airbnb for Boats, Airbnb for Pets, Airbnb for University Students, Airbnb for Boats, Airbnb for Takeout, Airbnb for Shared Office Space, Airbnb for Hostel Hoppers, Airbnb for Event Spaces, Airbnb for Travel Experiences, Airbnb for Car Ride-Sharing, Airbnb for Planes, Trains, and Automobiles, Airbnb for Workspace, Airbnb for Office Space, Airbnb for Cars, Airbnb for Tutoring, AirBnB for Experiences, AirBnB for Car Rentals

Uber for 17 things

Uber for House Painting, Uber for Weed, Uber for Flowers, Uber for Beauty, Uber for Anything, Uber for Bike Repair, Uber for Laundry, Uber for Gift Giving, Uber for Flowers, Uber for Medical Transport, Uber for Dog Walking, Uber for Massage, Uber for Private Jet Travel, Uber for Car Test Drives, Uber for Maids, Uber for Carwashes, Uber for The Courier Industry

LinkedIn for 14 things

LinkedIn for Medical Professionals, LinkedIn for Musicians, LinkedIn for Creatives, LinkedIn for The Military, LinkedIn for Creative Professionals, LinkedIn for Gamers, LinkedIn for MDs, LinkedIn for College Students, LinkedIn for The Gay Community, LinkedIn for Athletes, LinkedIn for Physicians, LinkedIn for Actors, Musicians, and Models, LinkedIn for Scientists, LinkedIn for Blue-Collar Workers

Code on GitHub, API endpoint

Again, the code to scrape TechCrunch’s historical headlines, parse the RSS feed for new stories, and extract data via regular expressions, is available on GitHub. You can also fetch the time series of TCBI values by making a GET request to