Lex Fridman

Lex Fridman (pronounced: Freedman)
Host of Lex Fridman Podcast
Research Scientist, MIT, 2015 - current (2025)
Laboratory for Information and Decision Systems (LIDS)
Research topics: Human-AI interaction, robotics, and machine learning.

Podcast: Lex Fridman Podcast
Research: lex.mit.edu
Lectures: deeplearning.mit.edu
Socials: X, YouTube, LinkedIn, Instagram, TikTok, Facebook, Reddit, Telegram.
Contact me: To contact me, please check out the Contact Page.

Beyond the above activities, I also enjoy:
- Playing guitar & piano (link is a video of me playing Comfortably Numb by Pink Floyd)
- Training & competing in jiu jitsu & judo (link is a video of me receiving my jiu jitsu black belt)

Research & Publications (Google Scholar)

MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Study of Human Interaction with Automation
Summary: Large-scale real-world AI-assisted driving data collection study to understand how human-AI interaction in driving can be safe and enjoyable. The emphasis is on computer vision based analysis of driver behavior in the context of automation use.
DeepTraffic: Reinforcement Learning System for Multi-Agent Dense Traffic Navigation
Summary: Traffic simulation and optimization with deep reinforcement learning. Primary goal is to make the hands-on study of deep RL accessible to thousands of students, educators, and researchers.
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
Summary: Framework for providing human supervision of a black box AI system that makes life-critical decisions. We demonstrate this approach on two applications: (1) image classification and (2) real-world data of AI-assisted steering in Tesla vehicles.
Active Authentication on Mobile Devices
Summary: An approach for verifying the identity of a smartphone user with with four biometric modalities. We evaluate the approach by collecting real-world behavioral biometrics data from smartphones of 200 subjects over a period of at least 30 days.
Cognitive Load Estimation in the Wild
Summary: Winner of the CHI 2018 Honorable Mention Award. We propose two novel vision-based methods for cognitive load estimation and evaluate them on a large-scale dataset collected under real-world driving conditions.
Learning Human Identity From Motion Patterns
Summary: Dense Clockwork RNNs learn shift-invariant representations from smartphone IMU data for passive biometric authentication, fixing temporal aliasing in CWRNNs while modeling multi-scale kinematics. Achieves 20% EER on 1500-user dataset of natural prehensile movements captured in the wild.
What Can Be Predicted from 6 Seconds of Driver Glances?
Summary: Winner of the CHI 2017 Best Paper Award. We consider a dataset of real-world, on-road driving to explore the predictive power of driver glances.
Driver Gaze Region Estimation without Use of Eye Movement
Summary: We propose a simplification of the general gaze estimation task by framing it as a gaze region estimation task in the driving context, thereby making it amenable to machine learning approaches. We go on to describe and evaluate one such learning-based approach.
CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild
Summary: Unification of cognitive load estimation and eye region analysis (landmark/pupil/blink detection) in a single deep learning framework using shared feature extraction with task-specific heads. Introduces Localized Feature Tracking to model cognitive load from tracked eye region features over time and Mask-Localized Regressor for sub-pixel precise keypoint detection, achieving 66.58% accuracy on real-world driving cognitive load classification while running at 38+ FPS with joint eye analysis.
A Fast Foveated Fully Convolutional Network Model for Human Peripheral Vision
Summary: Generative neural network is trained to simulate human peripheral vision degradation 21,000x faster than existing behaviorally-validated texture synthesis models (4.2 hours → 0.7 seconds per image), enabling real-time visualization of what observers see when fixating different locations. The network learns to replicate crowding and acuity loss effects from the Texture Tiling Model while preserving statistical accuracy for HCI design applications.
Semi-Automated Annotation of Discrete States in Large Video Datasets
Summary: Semi-automated video annotation framework reduces per-frame labeling to detecting state transitions, modeled with a Hidden Markov Model. On 16M driver-gaze frames, it cuts manual work by up to 84× while maintaining 91–99% accuracy
Driver Frustration Detection from Audio and Video in the Wild
Summary: A method for detecting driver frustration from both video and audio streams captured during the driver's interaction with an in-vehicle voice-based navigation system. An interesting observation: smiles are more common in unsatisfied vs satisfied interactions.
Crowdsourced Assessment of External Vehicle-to-Pedestrian Displays
Summary: 30 external vehicle-to-pedestrian display concepts for autonomous vehicles were evaluated. Simple, minimalist displays performed best.
Owl and Lizard: Patterns of Head Pose and Eye Pose in Driver Gaze Classification
Summary: Monocular driver gaze classification achieves 94.6% accuracy using head+eye pose vs 89.2% with head pose alone on 6-region classification. "Owlness" metric (dh/(dh+dp)) reveals users with eye-dominant gaze strategies benefit most from pupil detection, while head-movers show minimal improvement.
Automated Synchronization of Driving Data Using Vibration and Steering Events
Summary: A method for automated synchronization of vehicle sensors using accelerometer, telemetry, audio, and dense optical flow from three video sensors.
Decision Fusion for Multimodal Active Authentication
Summary: Continuous authentication via behavioral biometrics fuses 10 SVM-based sensors (mouse dynamics, keystroke timing, stylometry with varying window sizes, domain visit patterns) using Chair-Varshney optimal decision rule on 19-user office dataset. Achieves FAR=0.00122/FRR=0.00218, with stylometry contributing more error reduction than web browsing patterns in the multimodal fusion.
Observations on Sum User Rate for Cellular Downlink
Summary: Cellular downlink performance is analyzed using an expected spatial capacity metric based on SINR-driven user association. Results show counter-intuitive effects: clustered transmitter placement and shared channel use can yield higher sum rates than evenly spaced deployments or slot scheduling
Multi-modal Decision Fusion for Continuous Authentication
Summary: Decision fusion of 12 behavioral biometric sensors (keystroke dynamics, mouse movement, stylometry) for continuous authentication. System is be robust to partial spoofing.
Path Planning for Network Performance
Summary: Decentralized A* search computes Pareto-optimal paths for MANET nodes balancing minimum-time navigation against six network performance metrics (connected components, link density, multicommodity flow), achieving up to 5x performance improvement with zero travel time penalty. Nodes incorporate real-time network feedback to dynamically adjust trajectories, enabling significant connectivity gains in sparse networks where traditional formation control fails.
OMAN: A Mobile Ad Hoc Network Design System
Summary: OMAN integrates cross-layer resource allocation for mobile ad hoc networks into a unified optimization framework, solving power control under channel uncertainty, scheduling with directional antennas, and relay node movement planning simultaneously. The system provides both API and GUI interfaces for jointly optimizing network resources across PHY, MAC, and mobility layers rather than treating each layer's optimization problems in isolation.
Cross-Layer Multicommodity Capacity Expansion on Ad Hoc Wireless Networks of Cognitive Radios
Summary: Joint optimization of power, constellation size, scheduling, and flow across PHY/MAC/NET layers in cognitive radio networks achieves higher throughput than modular layer-by-layer design. Cross-layer resource allocation yields 20-140% performance gains over conventional layered approaches.
Communication-Based Motion Planning
Summary: Mobile agents navigating obstacle-laden terrain optimize movement timing along predetermined paths to minimize network disconnections, formulated as minimizing average strongly connected components. Cooperative uniform-cost search achieves optimality at O(βn²·2^(nT_max)) while distributed noncooperative planning scales to O(cβn³·2^T_max), achieving near-optimal 80-128% connectivity improvements.
On the Joint Impact of Bias and Power Control on Downlink Spectral Efficiency in Cellular Networks
Summary: Cell biasing and downlink power control are jointly optimized to improve cellular network spectral efficiency. Joint control shows significant improvements in mean-variance and throughput-fairness tradeoffs over using either control alone.
Robust Optimal Power Control for Ad Hoc Networks
Summary: Robust power control for ad hoc networks minimizes total transmit power while penalizing expected SINR violations under uncertain channels (fading, shadowing, noise). Achieves better feasibility-optimality tradeoff than deterministic methods using outdated channel state information.
Distributed Path Planning for Connectivity Under Uncertainty by Ant Colony Optimization
Summary: Distributed path planning via ant colony optimization minimizes time-averaged connected components under incomplete knowledge of jamming zones, using pheromone tables updated by utility functions combining distance-to-goal, nearest-neighbor, and learned no-comm zone probability estimates.