Category Archives: research

Detecting Road Surface Wetness from Audio with Recurrent Neural Networks

This blog post presents a dataset, video, and information for the paper titled “Detecting Road Surface Wetness from Audio: A Deep Learning Approach” (get PDF by clicking link). First, here’s a video about this work:


If you find this paper/page helpful in your work, please cite:

PS: You can find this paper on Google Scholar.


The dataset used for this paper is made available as a ZIP archive (click the link to download it). This dataset is what was used in the paper to train and evaluate the proposed LSTM model.

The zip archive contains 3 dry trips and 3 wet trips, each in a separate directory. Each trip contains 3 synchronized data streams that should line up perfectly together at frame-level accuracy:

  • Audio: File audio_mono.wav is the audio of the trip and the main data stream used in the paper.
  • Telemetry: File synced_data_fps30.csv contains a lot of information about the movement of the vehicle sampled at a fixed evenly-sampled rate of 30 Hz. This file is useful for lining up the speed of the vehicle with the audio of the tire’s interaction with the road.
  • Video: File video_front.mkv is the 30fps synchronized (to the other data streams) video of the forward roadway. This is useful for validation by visually confirming the approximate speed of the vehicle and wetness of the road at any moment in time.


Irman Abdic, MIT
Lex Fridman, MIT (contact author:
Eric Marchi, TUM
Daniel E. Brown, MIT
William Angell, MIT
Bryan Reimer, MIT
Björn Schuller, TUM and Imperial College London


FFCN: Fast Visualization of Human Peripheral Vision with Fully Convolutional Networks

This blog post presents information, demo, and source code (pending conference review process) for the paper titled “Fast Foveated Fully Convolutional Network Model for Human Peripheral Vision” (get pdf here). First, here’s a Javascript demo providing a visualization of foveations produced by the network. Click anywhere in the image to visualize the amount of information our model estimates the human eyes see when focusing on the point you clicked on:

Original Image
Serves as input to FFCN for generating foveated grid (right).
Foveated Image
Mouse-over image (or click on image on mobile) to foveate.


If you find this paper/page helpful in your work, please cite:


Lex Fridman, MIT
Benedikt Jenik, MIT
Shaiyan Keshvari, MIT
Bryan Reimer, MIT
Christoph Zetzsche, MIT
Ruth Rosenholtz, MIT

Detecting Driver Frustration from Audio and Video in the Wild

This blog post accompanies a paper we presented at the 2016 International Joint Conference on Artificial Intelligence (IJCAI) in New York City. You can download the paper here. Let’s start with a motivating question:

Question: Which one of these 2 drivers appears frustrated?


The answer is counterintuitive, especially from the generic affective computing perspective that would be much more likely to see the driver on the left as the more frustrated one. In fact, on a scale of 1 to 10 (where 1 is least frustrated and 10 is most frustrated), the driver on the left self-reported a frustration level of 1, while the driver on the right self-reported a frustration level of 9. Watch the following video to understand how “frustration” in the context of driving and using a voice-based navigation system may be different than the more generic affective concept of “frustration.”


Irman Abdić, TUM, MIT
Lex Fridman, MIT (contact author:
Daniel McDuff, MIT
Erik Marchi, TUM
Bryan Reimer, MIT
Björn W. Schuller, Imperial College London, University of Passau

Automated Synchronization of Driving Data: Video, Audio, Telemetry, and Accelerometer

This blog post presents a dataset and source code for the paper titled “Automated Synchronization of Driving Data Using Vibration and Steering Events” (get pdf here). First, here’s a video visualizing the data we’re looking to synchronize (best viewed in 1080p HD):


If you find this paper/page helpful in your work, please cite:

PS: You can find this paper on Google Scholar.


agelab-logoThe dataset used for this paper is collected and made available by MIT AgeLab. It is a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.

The zip archive containing the 2GB dataset can be downloaded here. This archive contains the following files:

  • Individual Sensor Data: Files like audio_energy.csv, data_*.csv, and video_*.csv contain information related to each individual sensor: video, audio, IMU, GPS and steering wheel position.
  • Video Data: Files video_*.mkv contain the video for front, dash, and face videos. Files video_*.csv contain timestamps for each frame in the video.
  • Synchronized Data: These files synced_data_fps*.csv contains the individual sensor data fused and sampled at the the frame rate (fps) specified in the filename. The frame rate included are 30, 90, and 100 fps.



Lex Fridman*, MIT
Daniel E. Brown, MIT
William Angell, MIT
Irman Abdic, MIT
Bryan Reimer, MIT
Hae Young Noh, CMU

*Contact: Continue reading

Fast Cross Correlation and Time Series Synchronization in Python

Computing the cross-correlation function is useful for finding the time-delay offset between two time series. Python has the numpy.correlate function. But there is a much faster FFT-based implementation. Check out the following paper for an application of this function:

[bibtex file=lanes.bib key=fridman2015sync]

We can test the above function by shifting the second series manually and seeing if the shift is accurately computed:

Python: Robust Conversion of Time String to Seconds with Missing Values


Python has the strptime function that can be used in the following way to convert from something like “00:00:5.37” to 5.37.

The problem with the above is that it requires a very specific string format. A function that converts a time string should be able to handle “00:00:5.37”, “00:5.37”, “5.37”, “5”, etc. The above code does not handle all those cases, but the follow code does just that…


Regular expressions to the rescue:


The code to test the function:

The correct output of the above is:

Another Approach:

If you were inclined to do the above in as few lines of code as possible, the approach from here is a good starting point. You just need to add handling of microseconds.

Reverse the Conversion:

Reversing the above conversion is much easier:

Convert Transparent Background of Image to Black (or Any Color)

Quick ImageMagick command to convert the transparent part of an image to a specific color (black in this case):

Setting the -flatten flag is important because otherwise the -background flag will not any effect.

For example, if we take the .png image on the left and run the above command, the output will be the image on the right.


Best WAMP for Web Development (2013 Edition)

I just wanted to put out there one guy’s opinion on the best WAMP package in 2013. I’ve had the chance to use all of the following, and here’s my rating from best (1) to worst (5). I should say first thought that all of them are very good and relatively easy to set up:

  1. EasyPHP
  2. WampServer
  3. Zend Server
  4. Uniform Server (aka UniServer)
  5. XAMPP

People on StackOverflow recommended UniServer and Zend Server, but I found UniServer difficult to use and configure and Zend Server too bulky what old-school framework-free PHP development. When I want to set up a web server, I’ll usually configure everything myself as opposed to using one of these WAMPs. But for simple development it’s nice to have something that takes under a minute to install and set up.

EasyPHP was the last thing I tried, because the name was frankly confusing (at the time, though it makes perfect sense now), but it turned out to be the easiest and most intuitive WAMP I’ve ever used. Basically, the reason I love it is super lighweight and auto-restarts every time you change the config file (httpd.conf, php.ini, etc).

This is just one guy’s opinion, but I wanted to put it out there in case it might help a fellow web developer.

Convert String to Qt Keycode in C++

I came across the need to convert a string representing a key on the keyboard to a keycode enum like Qt::Key (or anything else). Example conversions would be:

As you see the above includes not just alpha-numeric keys but modifiers and special keys. Qt has this parsing functionality in QKeySequence’s fromString static function:

You might as why I need this conversion. Well, I have a data file generated by GhostMouse. It’s a log of what I type via the keyboard. Here’s an example of me typing ” It “:

{SPACE down}
{Delay 0.08}
{SPACE up}
{Delay 2.25}
{SHIFT down}
{Delay 0.11}
{i down}
{Delay 0.02}
{SHIFT up}
{Delay 0.03}
{i up}
{Delay 0.05}
{t down}
{Delay 0.08}
{t up}
{Delay 0.05}
{SPACE down}
{Delay 0.12}
{SPACE up}

Unfortunately, QKeySequence does not acknowledge a “Ctrl” key by itself as a key. So, I add a non-modifier key to the sequence and then subtract it from the code in that case. Here’s the complete function that should get the job done: