Automated Synchronization of Driving Data: Video, Audio, Telemetry, and Accelerometer

This blog post presents a dataset and source code for the paper titled “Automated Synchronization of Driving Data Using Vibration and Steering Events” (get pdf here). If you find this post helpful in your work, please cite:

First, here’s a video visualizing the data we’re looking to synchronize (best viewed in 1080p HD):


agelab-logoThe dataset used for this paper is collected and made available by MIT AgeLab. It is a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.

The zip archive containing the 2GB dataset can be downloaded here. This archive contains the following files:

  • Individual Sensor Data: Files like audio_energy.csv, data_*.csv, and video_*.csv contain information related to each individual sensor: video, audio, IMU, GPS and steering wheel position.
  • Video Data: Files video_*.mkv contain the video for front, dash, and face videos. Files video_*.csv contain timestamps for each frame in the video.
  • Synchronized Data: These files synced_data_fps*.csv contains the individual sensor data fused and sampled at the the frame rate (fps) specified in the filename. The frame rate included are 30, 90, and 100 fps.



Lex Fridman*, MIT
Daniel E. Brown, MIT
William Angell, MIT
Irman Abdic, MIT
Bryan Reimer, MIT
Hae Young Noh, CMU


Source Code

Most of the code used for this paper is tied up in a larger system for processing and analyzing driving data. There is a lot of system-specific data management code (especially related to the database backend) that is not useful to anyone looking to synchronize their driving data based on vibration and steering events. For this reason, I’m providing just the core snippets of code for (1) computing the dense optical flow and (2) efficiently computing the cross correlation of data streams. First, here are some simple OpenCV video helper functions used in the code below:

Next, we compute the dense optical flow of a video file between starting and ending frames, and save the average horizontal and vertical flows to a CSV file, and save a visualization of the flow to a video file. The reason we provide from_frame and to_frame is so that the below function can be called in parallel. This is important because dense optical flow is a computationally intensive task that is easily parallelizable (in the time domain). So here’s the function

The optical flow gives the steering and vibration events in the videos. The other sensors are used in their raw form. So the second step is to compute the synchronizing time offset between two time series. For this we use an FFT-based cross correlation function (see this post for a simple test). It’s important that the two time series are sampled at exactly the same interval. If your data is not sampled in this way, it has to be first interpolated at a regular interval. We compute the cross correlation in the following way:

The compute_shift function takes two time series an produces the shift associated with the maximum cross correlation. Small adjustments have to be made to this shift based on which pair of sensors are being synchronized. See the paper for that table that shows these values: