Data updates from the taxi and ridehailing trip records on the City of Chicago’s open data portal. The code to calculate monthly aggregates is on GitHub
There are notes throughout the page with relevant definitions and caveats, plus more info at the bottom of the page
Data available for taxis and ridehailing apps
Loading…
Average time, distance, and fare
Both taxi and ridehailing datasets contain some records with either missing or obviously incorrect data, e.g. a $1,000 fare for a 1-mile trip. I’ve attempted to remove these bad records, and the graphs in this section are based only on trips with “clean” time, distance, and fare data. The specific logic used to filter trips is available here on GitHub
Total farebox calculations assume that the bad records have the same average fare as the clean records
Each individual ridehailing fare is rounded to the nearest multiple of $2.50 in the raw data, so averages and aggregates should be considered estimates
Tipping behavior
Tip data is not available for taxi fares paid with cash, so taxi averages are based on credit card fares only
Ridehailing by shared status
Solo trips are when the rider did not authorize a shared trip. Shared trips are when the rider authorized a shared trip, and at some point during the trip they shared the car with another customer. Unmatched share requests are when the rider authorized a shared trip, but from the time they got into the car to when they got out, they never shared the car with another customer
As an example, if Alice, Bob, and Charlie each requested a shared ride, and a single driver serviced their requests in this order:
picked up A
picked up B
dropped off A
dropped off B
picked up C
dropped off C
that would count as 3 trips: 2 shared and 1 unmatched share request
Pickups by geography
Each geography bucket is a group of community areas. Distances are measured from the center of each area to the center of the Loop. For example, the “within 2 miles of the Loop” bucket includes the Loop, Near North Side, Near South Side, and Near West Side. See here for a map of all of the community areas and bucket definitions
Near North Side to Lake View, weekdays 4:00 PM–8:00 PM
The City of Chicago originally published the taxi dataset in 2016, but it was paused in 2017 due to data consistency issues, before resuming again in 2019. Even after the fix, taxi trips are likely undercounted between November 2014 and December 2015. See these posts for more info:
2016 release
|
2019 update
The code to collect and process the raw data from the city’s website is available on GitHub
As of 2019, there are three licensed ridehailing apps in Chicago: Uber, Lyft, and Via. Note that the city refers to them collectively as “Transportation Network Providers”. The Chicago ridehailing dataset does not identify which company provided each trip
Taxi data updates monthly, ridehailing data updates quarterly
The city published an overview of some privacy-oriented measures it took when publishing both datasets
The graphs are built with Highcharts, and the underlying JSON API response is available here
Questions or issues: todd@toddwschneider.com
Some differences between the Chicago and New York datasets
Chicago provides a vehicle identifier for each taxi trip, New York does not
New York identifies which individual ridehailing company provided each trip, Chicago does not