Automated Synchronization of Driving Data: Video, Audio, Telemetry, and Accelerometer

This blog post presents a dataset and source code for the paper titled “Automated Synchronization of Driving Data Using Vibration and Steering Events” (get pdf here). First, here’s a video visualizing the data we’re looking to synchronize (best viewed in 1080p HD):

Citation

If you find this paper/page helpful in your work, please cite:

@article{fridman2016automated,
  title={Automated synchronization of driving data using vibration and steering events},
  author={Fridman, Lex and Brown, Daniel E and Angell, William and Abdi{\'c}, Irman and Reimer, Bryan and Noh, Hae Young},
  journal={Pattern Recognition Letters},
  volume={75},
  pages={9--15},
  year={2016},
  publisher={Elsevier}
}

PS: You can find this paper on Google Scholar.

Dataset

agelab-logoThe dataset used for this paper is collected and made available by MIT AgeLab. It is a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.

The zip archive containing the 2GB dataset can be downloaded here. This archive contains the following files:

  • Individual Sensor Data: Files like `audio_energy.csv`, `data_*.csv`, and `video_*.csv` contain information related to each individual sensor: video, audio, IMU, GPS and steering wheel position.
  • Video Data: Files `video_*.mkv` contain the video for front, dash, and face videos. Files `video_*.csv` contain timestamps for each frame in the video.
  • Synchronized Data: These files `synced_data_fps*.csv` contains the individual sensor data fused and sampled at the the frame rate (fps) specified in the filename. The frame rate included are 30, 90, and 100 fps.

agelab-car-sync-video-audio

Authors

Lex Fridman*, MIT
Daniel E. Brown, MIT
William Angell, MIT
Irman Abdic, MIT
Bryan Reimer, MIT
Hae Young Noh, CMU

*Contact:

Source Code

Most of the code used for this paper is tied up in a larger system for processing and analyzing driving data. There is a lot of system-specific data management code (especially related to the database backend) that is not useful to anyone looking to synchronize their driving data based on vibration and steering events. For this reason, I’m providing just the core snippets of code for (1) computing the dense optical flow and (2) efficiently computing the cross correlation of data streams. First, here are some simple OpenCV video helper functions used in the code below:

# return the total number of frames in a video file or VideoCapture object
def cv2_frame_count(cap_or_path):
    had_to_create_video_capture = False
    cap = cap_or_path # cap_or_path is used as the argument to make clear what is accepted as input

    if is_str(cap):
        had_to_create_video_capture = True
        assert os.path.isfile(cap)
        cap = cv2.VideoCapture(cap)
        assert cap.isOpened()

    assert type(cap).__module__ == 'cv2'
    frame_count = cap.get(cv2.CAP_PROP_FRAME_COUNT)
    assert frame_count.is_integer()

    if had_to_create_video_capture:
        cap.release()

    return int(frame_count)

# return the index of the current frame (first frame is index 0)
def cv2_current_frame(cap):
    x = cap.get(cv2.CAP_PROP_POS_FRAMES)
    assert x.is_integer()
    return int(x)

# jump to frame index 'frame_id'
def cv2_goto_frame(cap, frame_id):
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_id)
    assert cv2_current_frame(cap) == frame_id

# remove the extension from the path and return
def without_ext(path): 
    return os.path.splitext(path)[0]

# save a list of dict's to a csv file
def save_dict_to_csv(filepath, rows):
    assert len(rows) > 0
    with open(filepath, 'wb') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerow(rows[0].keys())
        for row in rows:
            writer.writerow(row.values())

Next, we compute the dense optical flow of a video file between starting and ending frames, and save the average horizontal and vertical flows to a CSV file, and save a visualization of the flow to a video file. The reason we provide `from_frame` and `to_frame` is so that the below function can be called in parallel. This is important because dense optical flow is a computationally intensive task that is easily parallelizable (in the time domain). So here’s the function

# in_video_path: path to input video file
# out_video_path: path to output video file
# from_frame: index of starting frame in input video
# to_frame: index of ending frame in input video
def compute_flow(in_video_path, out_video_path, from_frame, to_frame):
    vw = cv2.VideoWriter(out_video_path, cv2.cv.CV_FOURCC(*'XVID'), 30, (1280, 720))
    assert vw.isOpened()
    
    cap = cv2.VideoCapture(in_video_path)
    assert cap.isOpened()

    assert from_frame < to_frame
    assert to_frame < cv2_frame_count(cap) img_prev = None if from_frame >= 1:
        cv2_goto_frame(cap, from_frame-1)
        ret, img_prev = cap.read()
        img_prev = cv2.cvtColor(img_prev, cv2.COLOR_BGR2GRAY)
        assert ret
    else:
        assert from_frame == 0

    mag_sum_img = None

    rows = []

    frame_count = cv2_frame_count(cap)

    img_sample = None

    for frame_id in xrange(from_frame, to_frame+1):
        ret, img_color = cap.read()
        if img_sample is None:
            img_sample = img_color.copy()

        if not ret:
            print "failed reading image: frame_id = {}, frame_count = {}, from: {}, to: {}".format(
                frame_id, frame_count, from_frame, to_frame)
            img_color = img_sample.copy()

        assert ret or (to_frame - frame_id < 10)

        img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

        mag_x, mag_y = 0, 0

        if (img_prev is None) or (not ret):
            result = np.zeros_like(img_sample)
        else:
            hsv = np.zeros_like(img_color)
            hsv[...,1] = 255

            # implementation of the Farneback dense flow algorithm
            flow = cv2.calcOpticalFlowFarneback(img_prev, img,
                                                pyr_scale=0.5, levels=3, winsize=15, iterations=3,
                                                poly_n=5, poly_sigma=1.2, flags=0)

            mag_x, mag_y = np.mean(flow[...,0]), np.mean(flow[...,1])
            
            mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1])
            hsv[...,0] = ang * 180 / np.pi / 2

            mag_normed = mag.copy()
            np.clip(mag_normed, a_min=0, a_max=50, out=mag_normed)
            mag_normed = mag_normed * (255 / 50)
            hsv[...,2] = mag_normed

            result = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

        row = OrderedDict()
        row['frame'] = frame_id
        row['x'] = mag_x
        row['y'] = mag_y

        rows.append(row)

        img_prev = img
        
        vw.write(result)
    
    # make the max optical flow always show up as 255 intensity
    mag_sum_img = cv2.normalize(mag_sum_img, None, 0, 255, cv2.NORM_MINMAX)
    cv2.imwrite(without_ext(out_video_path) + '.jpg', mag_sum_img)
    
    # save the flow values
    cm.save_dict_to_csv(without_ext(out_video_path) + '.csv', rows)

    cap.release()

The optical flow gives the steering and vibration events in the videos. The other sensors are used in their raw form. So the second step is to compute the synchronizing time offset between two time series. For this we use an FFT-based cross correlation function (see this post for a simple test). It’s important that the two time series are sampled at exactly the same interval. If your data is not sampled in this way, it has to be first interpolated at a regular interval. We compute the cross correlation in the following way:

import numpy as np
from numpy.fft import fft, ifft, fft2, ifft2, fftshift

def cross_correlation_using_fft(x, y):
    f1 = fft(x)
    f2 = fft(np.flipud(y))
    cc = np.real(ifft(f1 * f2))
    return fftshift(cc)

# shift < 0 means that y starts 'shift' time steps before x # shift > 0 means that y starts 'shift' time steps after x
def compute_shift(x, y):
    assert len(x) == len(y)
    c = cross_correlation_using_fft(x, y)
    assert len(c) == len(x)
    zero_index = int(len(x) / 2) - 1
    shift = zero_index - np.argmax(c)
    return shift

The `compute_shift` function takes two time series an produces the shift associated with the maximum cross correlation. Small adjustments have to be made to this shift based on which pair of sensors are being synchronized. See the paper for that table that shows these values:

[bibtex file=lanes.bib key=fridman2015sync]