Category Archives: research

Detecting Driver Frustration from Audio and Video in the Wild

This blog post accompanies a paper we presented at the 2016 International Joint Conference on Artificial Intelligence (IJCAI) in New York City. You can download the paper here. Let’s start with a motivating question:

Question: Which one of these 2 drivers appears frustrated?

driver-frustation-detection

The answer is counterintuitive, especially from the generic affective computing perspective that would be much more likely to see the driver on the left as the more frustrated one. In fact, on a scale of 1 to 10 (where 1 is least frustrated and 10 is most frustrated), the driver on the left self-reported a frustration level of 1, while the driver on the right self-reported a frustration level of 9. Watch the following video to understand how “frustration” in the context of driving and using a voice-based navigation system may be different than the more generic affective concept of “frustration.”

Authors

Irman Abdić, TUM, MIT
Lex Fridman, MIT (contact author: fridman@mit.edu)
Daniel McDuff, MIT
Erik Marchi, TUM
Bryan Reimer, MIT
Björn W. Schuller, Imperial College London, University of Passau

Automated Synchronization of Driving Data: Video, Audio, Telemetry, and Accelerometer

This blog post presents a dataset and source code for the paper titled “Automated Synchronization of Driving Data Using Vibration and Steering Events” (get pdf here). If you find this post helpful in your work, please cite:

First, here’s a video visualizing the data we’re looking to synchronize (best viewed in 1080p HD):

Dataset

agelab-logoThe dataset used for this paper is collected and made available by MIT AgeLab. It is a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.

The zip archive containing the 2GB dataset can be downloaded here. This archive contains the following files:

  • Individual Sensor Data: Files like audio_energy.csv, data_*.csv, and video_*.csv contain information related to each individual sensor: video, audio, IMU, GPS and steering wheel position.
  • Video Data: Files video_*.mkv contain the video for front, dash, and face videos. Files video_*.csv contain timestamps for each frame in the video.
  • Synchronized Data: These files synced_data_fps*.csv contains the individual sensor data fused and sampled at the the frame rate (fps) specified in the filename. The frame rate included are 30, 90, and 100 fps.

agelab-car-sync-video-audio

Authors

Lex Fridman*, MIT
Daniel E. Brown, MIT
William Angell, MIT
Irman Abdic, MIT
Bryan Reimer, MIT
Hae Young Noh, CMU

*Contact: fridman@mit.edu

Source Code

Most of the code used for this paper is tied up in a larger system for processing and analyzing driving data. There is a lot of system-specific data management code (especially related to the database backend) that is not useful to anyone looking to synchronize their driving data based on vibration and steering events. For this reason, I’m providing just the core snippets of code for (1) computing the dense optical flow and (2) efficiently computing the cross correlation of data streams. First, here are some simple OpenCV video helper functions used in the code below:

Next, we compute the dense optical flow of a video file between starting and ending frames, and save the average horizontal and vertical flows to a CSV file, and save a visualization of the flow to a video file. The reason we provide from_frame and to_frame is so that the below function can be called in parallel. This is important because dense optical flow is a computationally intensive task that is easily parallelizable (in the time domain). So here’s the function

The optical flow gives the steering and vibration events in the videos. The other sensors are used in their raw form. So the second step is to compute the synchronizing time offset between two time series. For this we use an FFT-based cross correlation function (see this post for a simple test). It’s important that the two time series are sampled at exactly the same interval. If your data is not sampled in this way, it has to be first interpolated at a regular interval. We compute the cross correlation in the following way:

The compute_shift function takes two time series an produces the shift associated with the maximum cross correlation. Small adjustments have to be made to this shift based on which pair of sensors are being synchronized. See the paper for that table that shows these values:

Fast Cross Correlation and Time Series Synchronization in Python

Computing the cross-correlation function is useful for finding the time-delay offset between two time series. Python has the numpy.correlate function. But there is a much faster FFT-based implementation. Check out the following paper for an application of this function:

We can test the above function by shifting the second series manually and seeing if the shift is accurately computed:

Python: Robust Conversion of Time String to Seconds with Missing Values

Problem:

Python has the strptime function that can be used in the following way to convert from something like “00:00:5.37” to 5.37.

The problem with the above is that it requires a very specific string format. A function that converts a time string should be able to handle “00:00:5.37”, “00:5.37”, “5.37”, “5”, etc. The above code does not handle all those cases, but the follow code does just that…

Solution:

Regular expressions to the rescue:

Test:

The code to test the function:

The correct output of the above is:

Another Approach:

If you were inclined to do the above in as few lines of code as possible, the approach from here is a good starting point. You just need to add handling of microseconds.

Reverse the Conversion:

Reversing the above conversion is much easier:

Convert Transparent Background of Image to Black (or Any Color)

Quick ImageMagick command to convert the transparent part of an image to a specific color (black in this case):

Setting the -flatten flag is important because otherwise the -background flag will not any effect.

For example, if we take the .png image on the left and run the above command, the output will be the image on the right.

steering-wheel-smallersteering-wheel

Best WAMP for Web Development (2013 Edition)

I just wanted to put out there one guy’s opinion on the best WAMP package in 2013. I’ve had the chance to use all of the following, and here’s my rating from best (1) to worst (5). I should say first thought that all of them are very good and relatively easy to set up:

  1. EasyPHP
  2. WampServer
  3. Zend Server
  4. Uniform Server (aka UniServer)
  5. XAMPP

People on StackOverflow recommended UniServer and Zend Server, but I found UniServer difficult to use and configure and Zend Server too bulky what old-school framework-free PHP development. When I want to set up a web server, I’ll usually configure everything myself as opposed to using one of these WAMPs. But for simple development it’s nice to have something that takes under a minute to install and set up.

EasyPHP was the last thing I tried, because the name was frankly confusing (at the time, though it makes perfect sense now), but it turned out to be the easiest and most intuitive WAMP I’ve ever used. Basically, the reason I love it is super lighweight and auto-restarts every time you change the config file (httpd.conf, php.ini, etc).

This is just one guy’s opinion, but I wanted to put it out there in case it might help a fellow web developer.

Convert String to Qt Keycode in C++

I came across the need to convert a string representing a key on the keyboard to a keycode enum like Qt::Key (or anything else). Example conversions would be:

As you see the above includes not just alpha-numeric keys but modifiers and special keys. Qt has this parsing functionality in QKeySequence’s fromString static function:

You might as why I need this conversion. Well, I have a data file generated by GhostMouse. It’s a log of what I type via the keyboard. Here’s an example of me typing ” It “:

{SPACE down}
{Delay 0.08}
{SPACE up}
{Delay 2.25}
{SHIFT down}
{Delay 0.11}
{i down}
{Delay 0.02}
{SHIFT up}
{Delay 0.03}
{i up}
{Delay 0.05}
{t down}
{Delay 0.08}
{t up}
{Delay 0.05}
{SPACE down}
{Delay 0.12}
{SPACE up}

Unfortunately, QKeySequence does not acknowledge a “Ctrl” key by itself as a key. So, I add a non-modifier key to the sequence and then subtract it from the code in that case. Here’s the complete function that should get the job done:

Include Full BibTeX Entry Inside Text

This isn’t rocket science, but you can insert BibTeX entries as you go, instead of (dumping all the references at the end in a “References” section. It’s useful when writing many chaotic pages of notes as you crawl through dozens of publications on a particular topic you’re researching, or when making a slide presentation in Beamer.

Another way is to use the biblatex package. Note, that at least in Ubuntu, you need to biblatex doesn’t come even with the full install of texlive-all and needs to be added seperately in the repository as a package of the same name.

WampServer vs XAMPP

wamp-server-easy-configuration-menuThis is not an Earth shattering post. Moreover, it’s not an objective one, that is, it is based on my experience and my experience only.

WAMP is a bundle of Apache, MySql, and PHP (though often also Perl and/or Python) for Windows. These are the things you need to run a dynamic website (in my case, for development and testing) on your computer in Windows. The idea is that a WAMP bundle makes the process of downloading, installing, and configuring the individual components easy and quick.

The two most popular WAMP tools are XAMPP and WampServer. I used XAMPP for a couple years, but it started disconnecting me from the internet recently for some unknown reason (that is probably my fault). I tried to fix it, and couldn’t especially given the conditions under which the problem arose were hard to pin down precisely.

So I installed WampServer instead last week, and I have to say that the basic task of turning everything on and off, tweaking configuring, enabling modules, changing versions of individual components, etc is a lot easier in WampServer. By “easier” I mean it’s more intuitive. The little menu that pops up from WampServer icon in the taskbar has all the options you need and none you don’t.

There may be people out there that have had a different experience, but I know I myself was wondering whether grass is greener on the other side, and in this case it was!