Author Archives: Lex Fridman

Detecting Road Surface Wetness from Audio with Recurrent Neural Networks

This blog post presents a dataset, video, and information for the paper titled “Detecting Road Surface Wetness from Audio: A Deep Learning Approach” (get PDF by clicking link). First, here’s a video about this work:

Citation

If you find this paper/page helpful in your work, please cite:

PS: You can find this paper on Google Scholar.

Dataset

The dataset used for this paper is made available as a ZIP archive (click the link to download it). This dataset is what was used in the paper to train and evaluate the proposed LSTM model.

The zip archive contains 3 dry trips and 3 wet trips, each in a separate directory. Each trip contains 3 synchronized data streams that should line up perfectly together at frame-level accuracy:

  • Audio: File audio_mono.wav is the audio of the trip and the main data stream used in the paper.
  • Telemetry: File synced_data_fps30.csv contains a lot of information about the movement of the vehicle sampled at a fixed evenly-sampled rate of 30 Hz. This file is useful for lining up the speed of the vehicle with the audio of the tire’s interaction with the road.
  • Video: File video_front.mkv is the 30fps synchronized (to the other data streams) video of the forward roadway. This is useful for validation by visually confirming the approximate speed of the vehicle and wetness of the road at any moment in time.

Authors

Irman Abdic, MIT
Lex Fridman, MIT (contact author: fridman@mit.edu)
Eric Marchi, TUM
Daniel E. Brown, MIT
William Angell, MIT
Bryan Reimer, MIT
Björn Schuller, TUM and Imperial College London

 

FFCN: Fast Visualization of Human Peripheral Vision with Fully Convolutional Networks

This blog post presents information, demo, and source code (pending conference review process) for the paper titled “Fast Foveated Fully Convolutional Network Model for Human Peripheral Vision” (get pdf here). First, here’s a Javascript demo providing a visualization of foveations produced by the network. Click anywhere in the image to visualize the amount of information our model estimates the human eyes see when focusing on the point you clicked on:

Original Image
Serves as input to FFCN for generating foveated grid (right).
Foveated Image
Mouse-over image (or click on image on mobile) to foveate.
Original Image
Serves as input to FFCN for generating foveated grid (right).
Foveated Image
Mouse-over image (or click on image on mobile) to foveate.
Original Image
Serves as input to FFCN for generating foveated grid (right).
Foveated Image
Mouse-over image (or click on image on mobile) to foveate.

Citation

If you find this paper/page helpful in your work, please cite:

Authors

Lex Fridman, MIT
Benedikt Jenik, MIT
Shaiyan Keshvari, MIT
Bryan Reimer, MIT
Christoph Zetzsche, MIT
Ruth Rosenholtz, MIT

Detecting Driver Frustration from Audio and Video in the Wild

This blog post accompanies a paper we presented at the 2016 International Joint Conference on Artificial Intelligence (IJCAI) in New York City. You can download the paper here. Let’s start with a motivating question:

Question: Which one of these 2 drivers appears frustrated?

driver-frustation-detection

The answer is counterintuitive, especially from the generic affective computing perspective that would be much more likely to see the driver on the left as the more frustrated one. In fact, on a scale of 1 to 10 (where 1 is least frustrated and 10 is most frustrated), the driver on the left self-reported a frustration level of 1, while the driver on the right self-reported a frustration level of 9. Watch the following video to understand how “frustration” in the context of driving and using a voice-based navigation system may be different than the more generic affective concept of “frustration.”

Authors

Irman Abdić, TUM, MIT
Lex Fridman, MIT (contact author: fridman@mit.edu)
Daniel McDuff, MIT
Erik Marchi, TUM
Bryan Reimer, MIT
Björn W. Schuller, Imperial College London, University of Passau

Automated Synchronization of Driving Data: Video, Audio, Telemetry, and Accelerometer

This blog post presents a dataset and source code for the paper titled “Automated Synchronization of Driving Data Using Vibration and Steering Events” (get pdf here). First, here’s a video visualizing the data we’re looking to synchronize (best viewed in 1080p HD):

Citation

If you find this paper/page helpful in your work, please cite:

PS: You can find this paper on Google Scholar.

Dataset

agelab-logoThe dataset used for this paper is collected and made available by MIT AgeLab. It is a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.

The zip archive containing the 2GB dataset can be downloaded here. This archive contains the following files:

  • Individual Sensor Data: Files like audio_energy.csv, data_*.csv, and video_*.csv contain information related to each individual sensor: video, audio, IMU, GPS and steering wheel position.
  • Video Data: Files video_*.mkv contain the video for front, dash, and face videos. Files video_*.csv contain timestamps for each frame in the video.
  • Synchronized Data: These files synced_data_fps*.csv contains the individual sensor data fused and sampled at the the frame rate (fps) specified in the filename. The frame rate included are 30, 90, and 100 fps.

agelab-car-sync-video-audio

Authors

Lex Fridman*, MIT
Daniel E. Brown, MIT
William Angell, MIT
Irman Abdic, MIT
Bryan Reimer, MIT
Hae Young Noh, CMU

*Contact: fridman@mit.edu Continue reading

Fast Cross Correlation and Time Series Synchronization in Python

Computing the cross-correlation function is useful for finding the time-delay offset between two time series. Python has the numpy.correlate function. But there is a much faster FFT-based implementation. Check out the following paper for an application of this function:

[bibtex file=lanes.bib key=fridman2015sync]

We can test the above function by shifting the second series manually and seeing if the shift is accurately computed:

Python: Robust Conversion of Time String to Seconds with Missing Values

Problem:

Python has the strptime function that can be used in the following way to convert from something like “00:00:5.37” to 5.37.

The problem with the above is that it requires a very specific string format. A function that converts a time string should be able to handle “00:00:5.37”, “00:5.37”, “5.37”, “5”, etc. The above code does not handle all those cases, but the follow code does just that…

Solution:

Regular expressions to the rescue:

Test:

The code to test the function:

The correct output of the above is:

Another Approach:

If you were inclined to do the above in as few lines of code as possible, the approach from here is a good starting point. You just need to add handling of microseconds.

Reverse the Conversion:

Reversing the above conversion is much easier:

Convert Transparent Background of Image to Black (or Any Color)

Quick ImageMagick command to convert the transparent part of an image to a specific color (black in this case):

Setting the -flatten flag is important because otherwise the -background flag will not any effect.

For example, if we take the .png image on the left and run the above command, the output will be the image on the right.

steering-wheel-smallersteering-wheel

My Experience Training BJJ at Google

My time at Google has been great on many levels. There are a lot of perks (delicious free food, nap pods, game rooms, etc), but the best part is the work. It may sound strange, but I love doing research and I love programming. Those two things combine perfectly in my position here. There of course are lots of gamers working for google, I always hear them talking about how they get new accessories at Armchair Empire for gaming.

google-bjjAs has been known for a while now, Google has a jiu jitsu class. I finally got around to attending it. It is run by Milton Bastos, a good teacher and a good competitor. Check out his academy.

The beautiful part about jiu jitsu that I’ve noticed throughout my journey is that once you put on the gi, you shed the identity that you carry in the outside world. Your position at Google or anywhere else doesn’t matter. The only thing that matters is your technique. There is something very honest and simple about that.

I got to roll with a few very good people (purple, brown, and black belts). The brown belt girl and the lighter black belt in particular threw a few interesting techniques at me that kept it fun and different. I like unorthodox games. It’s like a new puzzle that needs to be solved.

Now, IBJJF needs to organize a Silicon Valley Open: a tournament between the employees of all the companies in the area. Instead of Alliance, Atos, and Gracie Barra the teams would be Google, Facebook, and Microsoft.

Goodbye Philadelphia

philadelphia-lex-fridman

I’ve lived in the same one bedroom apartment for most of my twenties, during my undergrad and grad studies. If you’re reading this, you may have visited there on occasion. But if you have not, I’d say it’s a cross between a library and a cave. A dozen bookcases line the walls, full of books and countless trinkets: two staplers, a postcard from a high school friend who I never really got to know, a mug from France or Germany or Italy, who knows… and hundreds of other things that gathered dust and watched as I made mistakes of all kinds but mostly with cooking.

All those things are now gone. I got rid of 99% of my possessions, and moved out of the apartment. Most of what I own now fits into a carry-on suitcase. All I’m left with are the phone numbers of people I love and the chaos of ideas rumbling around in my head, waiting to spill out. Yesterday, as I walked around on this year’s first snow, I couldn’t help but miss every little thing about life in Philadelphia before I even left. Everyone is still here, within reach, but I already miss them.

I miss the people I’ve worked with in academia: long hours chasing deadlines, enthusiastically tossing around ideas like kids building a LEGO castle without the instructions. I miss the people I’ve trained judo and jiu jitsu with: blood, sweat, and tears spilled on the mat over a pijama game that somehow forced me and everyone else to confront fears, weaknesses, and the absurdly delusional ramblings of the ego. I miss the friend with whom I traveled across the country with: the Neal Cassady of my life “on the road.” I miss my mom, my dad, my brother. I miss playing music at shady bars: with a great guitar in hand, no, could be the best? Gibson Les Paul everytime, singing songs in front of people who were too drunk to care about anything except a good Hendrix cover: “Hey Joe, where you going with that gun in your hand?” I miss the long runs on Kelly Drive. I miss the late night diners: the grey faces, the burnt coffee, and the feeling that nothing matters and everything is beautiful. I miss the long bus rides north, alone. I miss the people. I miss the conversations. I miss being younger… and stupider.

Goodbye Philly… for now. I’ll be back.

“What is that feeling when you’re driving away from people and they recede on the plain till you see their specks dispersing? – it’s the too-huge world vaulting us, and it’s good-bye. But we lean forward to the next crazy venture beneath the skies.”
– Jack Kerouac, On the Road

The photo up top is taken by my brother. I love you bro.