Public dataset for multi-modal mobility

Check Markdown links

👋 Read the quickstart first. It has examples that can give you a quick overview.

Background information

  1. Evaluation procedure
  2. Evaluation metrics

Data characteristics

This dataset contains data from 3 artificial timelines. The timelines cover 15 separate modes, including recently popular modes such as e-scooter and e-bike. We had two main goals for the data collection.

Dwell time
Instead of focusing only on trips, we wanted to evaluate a timeline that included significant dwell time. We could see from our calibration runs that android appears to have built-in duty cycling. Including significant dwell time would allow us to capture the impact of this context sensitive behavior. Therefore, we structured our timeline trips as round trips to libraries with an intermediate dwell time ~ 3x the mean travel time to the location.
Broad range of modes
Since we are creating artificial trips, we can structure them to maximize mode variety. In order to efficiently cover this space, we tried to ensure that no mode was repeated. We only had to include commuter rail twice since there were few other transit options to reach the starting point chosen.
Multi-modal transfers
Detecting multi-modal transfers is tricky because there isn't a clear signal similar to a trip end. We ensure that there are many transition examples by emphasizing multi-modal transfers while constructing our artificial trips.

A brief summary of the timelines is as below. The details are in the dataset (summary and filled).

id Description Outgoing trip modes Incoming trip modes Travel time Dwell time Overall time
unimodal_trip_car_bike_mtv_la Suburban round trip car (suburban street) bicycle 40 mins 1.5 hr 2 hrs
car_scooter_brex_san_jose Downtown library car (freeway) escooter, bus rapid transit 3 hrs 3 hrs 6 hrs
train_bus_ebike_mtv_ucb Multi-modal trip across the bay suburb walk, commuter train, subway, city bus, university walk ebike, express bus, downtown walk, light rail, commuter train with tunnels, suburb walk 6 hrs 6 hrs 12 hrs

We currently have the following data from these timelines.

id accuracy control HAHFDC HAMFDC MAHFDC MAMFDC total runs travel hrs travel hrs for android + iOS total hrs total hrs for android + iOS
unimodal_trip_car_bike_mtv_la 6 6 3 3 0 18 ~ 12 ~ 24 36 72
car_scooter_brex_san_jose 7 6 3 4 1 21 63 126 126 252
train_bus_ebike_mtv_ucb 12 6 6 6 6 36 216 432 432 864
Total               582   1188


Since the data was collected over multiple months, there were small reroutes to some of the trajectories. Notably:

This will be automatically handled in the spec.


I would like to thank Harrison Liew and gennui raffill for their assistance in salvaging this data collection effort after the Lyft e-bike debacle. The third timeline includes a one-way ride from the UC Berkeley campus to the Transbay bus stop on an e-bike. I used Lyft bikeshare e-bikes for the first round of data collection, but then a couple of e-bikes caught on fire, and Lyft pulled all e-bikes from service. I was now stuck since using a regular bike would preclude comparisons with the first round.

Fortunately, Harrison and gennui stepped in and helped out by lending an e-bike and riding the bike back to campus to complete the round trip. Neither rain nor heat nor lost phones or unexpected band practice stayed us from the slow collection of all four quadrants of the frequency/accuracy combinations. And Lyft had still not put e-bikes into service at this time, so this dataset would not have been possible without their help.

Please cite the following work when using this dataset: