Transcript for Mark Zuckerberg: First Interview in the Metaverse | Lex Fridman Podcast #398

This is a transcript of Lex Fridman Podcast #398 with Mark Zuckerberg. The timestamps in the transcript are clickable links that take you directly to that point in the main video. Please note that the transcript is human generated, and may have errors. Here are some useful links:

Table of Contents

Here are the loose “chapters” in the conversation. Click link to jump approximately to that part in the transcript:


Lex Fridman (00:00:00) The following is a conversation with Mark Zuckerberg, Inside the Metaverse. Mark and I are hundreds of miles apart from each other in physical space, but it feels like we’re in the same room because we appear to each other as photorealistic Kodak Avatars in 3D with spatial audio. This technology is incredible and I think it’s the future of how human beings connect to each other in a deeply meaningful way on the internet. These avatars can capture many of the nuances of facial expressions that we humans use to communicate and motion to each other. Now, I just need to work on upgrading my emotion expressing capabilities of the underlying human. This is the Lex Fridman Podcast. And now, dear friends, here’s Mark Zuckerberg. This is so great. Lighting change? Wow.


Mark Zuckerberg (00:01:16) Yeah, we can put the light anywhere.
Lex Fridman (00:01:19) And it doesn’t feel awkward to be really close to you.
Mark Zuckerberg (00:01:22) No, it does. I actually moved you back a few feet before you got into the headset. You were right here.
Lex Fridman (00:01:27) I don’t know if people can see this, but this is incredible. The realism here is just incredible. Where am I? Where are you, Mark? Where are we?
Mark Zuckerberg (00:01:39) You’re in Austin, right?
Lex Fridman (00:01:41) No. I mean this place. We’re shrouded by darkness with ultra realistic face, and it just feels like we’re in the same room. This is really the most incredible thing I’ve ever seen. And sorry to be in your personal space. We have done jujitsu before.
Mark Zuckerberg (00:01:58) Yeah. I was commenting to the team before that I feel like we’ve choked each other from further distances than it feels like we are right now.
Lex Fridman (00:02:08) This is just really incredible. I don’t know how to describe it with words. It really feels like we’re in the same room.
Mark Zuckerberg (00:02:17) Yeah.
Lex Fridman (00:02:17) It feels like the future. This is truly, truly incredible. I just wanted to take it in. I’m still getting used to it. It’s you, it’s really you, but you’re not here with me. You’re there wearing a headset and I’m wearing a headset. It’s really, really incredible. Can you describe what it takes currently for us to appear so photo realistic to each other?
Mark Zuckerberg (00:02:44) Yeah. So, for background, we both did these scans for this research project that we have at Meta called Kodak Avatars. And the idea is that instead of our avatars being cartoony and instead of actually transmitting a video, what it does is we’ve scanned ourselves and a lot of different expressions, and we’ve built a computer model of each of our faces and bodies and the different expressions that we make and collapsed that into a Kodak that then when you have the headset on your head, it sees your face, it sees your expression, and it can basically send an encoded version of what you’re supposed to look like over the wire. So, in addition to being photorealistic, it’s also actually much more bandwidth efficient than transmitting a full video or especially a 3D immersive video of a whole scene like this.
Lex Fridman (00:03:47) And it captures everything. To me, the subtleties of the human face, even the flaws, that’s all amazing. It makes it so much more immersive. It makes you realize that perfection isn’t the thing that leads to immersion. It’s the little subtle flaws like freckles and variations in color and just…
Mark Zuckerberg (00:04:11) Wrinkles.
Lex Fridman (00:04:12) … all stuff about noses.
Mark Zuckerberg (00:04:12) Asymmetry.
Lex Fridman (00:04:14) Yeah, asymmetry, and just the corners of the eyes, what your eyes do when you smile, all that kind of stuff.
Mark Zuckerberg (00:04:20) Eyes are a huge part of it.
Lex Fridman (00:04:22) It’s just incredible.
Mark Zuckerberg (00:04:23) There’s all these studies that most of communication, even when people are speaking, is not actually the words that they’re saying. It’s the expression and all that. And we try to capture that with the classical expressive avatar system that we have. That’s the more cartoon designed one. You can put those expressions on those faces as well. But there’s obviously a certain realism that comes with delivering this photo realistic experience that, I don’t know, I just think it’s really magical. This gets to the core of what the vision around virtual and augmented reality is, of delivering a sense of presence as if you’re there together no matter where you actually are in the world. This experience I think is a good embodiment of that, where we’re in two completely different states halfway across the country, and it looks like you’re just sitting right in front of me. It’s pretty wild.
Lex Fridman (00:05:17) Yeah. I’m almost getting emotional. It feels like a totally fundamentally new experience. For me to have these kinds of conversations with loved ones, it would just change everything. Maybe just to elaborate, I went to Pittsburgh and went through the whole scanning procedure, which has so much incredible technology, software and hardware, going on, but it is a lengthy process. So what’s your vision for the future of this in terms of making this more accessible to people?
Mark Zuckerberg (00:05:50) It starts off with a small number of people doing these very detailed scans. That’s the version that you did and that I did, and before there were a lot of people who we’ve done this a scan for, we probably need to over collect expressions when we’re doing the scanning because we haven’t figured out how much we can reduce that down to a really streamlined process and extrapolate from the scans that have already been done. But the goal, and we have a project that’s working on this already, is just to do a very quick scan with your cell phone where you just take your phone, wave it in front of your face for a couple of minutes, say a few sentences, make a bunch of expressions, but, overall, have the whole process just be two to three minutes and then produce something that’s of the quality of what we have right now.
(00:06:44) So I think that that’s one of the big challenges that remains, and right now we have the ability to do the scans if you have hours to sit for one. And with today’s technology, you’re using a Meta headset that exists. It’s a product that’s for sale now. You can drive these with that, but the production of these scans in a very efficient way is one of the last pieces that we still need to really nail. And then, obviously, there’s all the experiences around it. Right now we’re sitting in a dark room, which is familiar for your podcast, but I think part of the vision for this over time is not just having this be a video call. That’s fine, it’s cool, it feels like it’s immersive, but you can do a video call on your phone.
(00:07:35) The thing that you can do in the Metaverse that is different from what you can do on a phone is doing stuff where you’re physically there together and participating in things together. And we could play games like this. We could have meetings like this in the future. Once you get mixed reality and augmented reality, we could have Kodak Avatars like this and go into a meeting and have some people physically there and have some people show up in this photorealistic form superimposed on the physical environment. Stuff like that is going to be super powerful. So we’ve got to still build out all those applications and the use cases around it. But I don’t know, I think it’s going to be a pretty wild next few years around this.
Lex Fridman (00:08:17) I’m actually almost at a loss for words. This is just so incredible. This is truly incredible. I hope that people watching this can get a glimpse of how incredible it is. It really feels like we’re in the same room. I guess there’s an uncanny valley that seems to have been crossed here. It looks like you.
Mark Zuckerberg (00:08:38) There’s still a bunch of tuning that I think we’ll want to do where different people emote to different extents, so I think one of the big questions is, when you smile, how wide is your smile? And how wide do you want your smile to be? And I think getting that to be tuned on a per person basis is going to be one of the things that we’re going to need to figure out. It’s like to, what extent do you want to give people control over that? Some people might prefer a version of themselves that’s more emotive in their avatar than their actual faces. So, for example, I always get a lot of critique and shit for having a relatively stiff expression. I might feel pretty happy, but just make a pretty small smile.
(00:09:31) So maybe, for me, it’s like I’d want to have my avatar really be able to better express how I’m feeling than how I can do physically. So I think that there’s a question about how you want to tune that, but, overall, yeah, we want to start from the baseline of capturing how people actually emote and express themselves. And I think the initial version of this has been pretty impressive. And like you said, I do think we’re beyond the uncanny valley here where it does feel like you. It doesn’t feel weird or anything like that.
Lex Fridman (00:10:05) That’s going to be the meme that the two most monotone people are in the Metaverse together, but I think that actually makes it more difficult. The amazing thing here is that the subtleties of the expression of the eyes, people say I’m monotone and emotionless, but I’m not. It’s just maybe my expression of emotion is more subtle, usually, with the eyes. And that’s one of the things I’ve noticed is just how expressive the subtle movement of the corners of the eyes are in terms of displaying happiness or boredom or all that stuff.
Mark Zuckerberg (00:10:39) I am curious to see, just because I’ve never done one of these before, I’ve never done a podcast as one of these Kodak Avatars, and I’m curious to see what people think of it. Because one of the issues that we’ve had in some of the VR and mixed reality work is it tends to feel a lot more profound when you’re in it than the 2D videos capturing the experience. So I think that this one, because it’s photorealistic, may look as amazing in 2D for people watching it as it feels, I think, to be in it. But we’ve certainly had this issue where a lot of the other things, it’s like you feel the sense of immersion when you’re in it that, that doesn’t quite translate to a 2D screen. But I don’t know, I’m curious to see what people think.
Lex Fridman (00:11:21) Yeah, I’m curious to see if people could see that my heart is actually beating fast now. This is super interesting that such intimacy of conversation could be achieved remotely. I don’t do remote podcasts for this reason, and this breaks all of that. This feels like just an incredible transition to something else, the different communication. It breaks all barriers, like geographic physical barriers. You mentioned, do you have a sense of timeline in terms of how many difficult things have to be solved to make this more accessible to like scanning with a smartphone?
Mark Zuckerberg (00:12:02) Yeah. I think we’ll probably roll this out progressively over time. So it’s not going to be like we roll it out and one day everyone has a Kodak Avatar. We want to get more people scanned and into the system, and then we want to start integrating it into each one of our apps, making it so that I think that for a lot of the work style things, productivity, I think that this is going to make a ton of sense. In a lot of game environments, this could be fine, but games tend to have their own style where you almost want to fit more with the aesthetic style of the game. But I think for doing meetings, one of the things that we get a lot of feedback on Workrooms where people are pretty blown away by the experience and this feeling that you can be remote but feel like you’re physically there around a table with people, but then we get some feedback that people have a hard time with the fact that the avatars are so expressive and don’t feel as realistic in that environment.
(00:12:58) So I think something like this could make a very big difference for those remote meetings. And especially with Quest Three coming out, which is going to be the first mainstream mixed reality product where you’re really taking digital expressions of either a person or objects and overlaying them on the physical world, I think the ability to do remote meetings and things like that where you’re just remote hang sessions with friends, I think that that’s going to be very exciting. So rolling it out over the next few years, it’s not ready to be a mainstream product yet, but we we’ll keep tuning it and keep getting more scans in there and rolling it out and into more of the features. But, yeah, definitely in the next few years you’ll be seeing a bunch more experiences like this.
Lex Fridman (00:13:44) Yeah, I would love to see some celebrities scanned and some non-celebrities and just more people to experience this. I would love to see that. My mind blown. I’m literally at a loss for words because it’s very difficult to just convey how incredible this is, how I feel the emotion, how I feel the presence, how I feel the subtleties of the emotion in terms of work meetings or in terms of podcasts. This is awesome. I don’t even need your arms or legs.
Mark Zuckerberg (00:14:17) Well, we got to get that. That’s its own challenge.
Lex Fridman (00:14:22) Okay.
Mark Zuckerberg (00:14:22) And part of the question is also, so you have the scan, then it takes a certain amount of compute to go drive that, both for the sensors on the headset and then rendering it. So one of the things that we’re working through is, what is the level of fidelity that is optimal? You could do the full body in a Kodak and that can be quite intensive, but one of the things that we’re thinking about is, all right, maybe you can stitch a somewhat lower fidelity version of your body and have the major movements, but your face is really the thing that we have the most resolution on in terms of being able to read and express emotions. Like you said, if you move your eyebrows a millimeter, that really changes the expression and what you’re emoting whereas moving your arm like an inch probably doesn’t matter quite as much. So, yes, I think that we do want to get all of that into here, and that’ll be some of the work over the next period as well.

Quest 3

Lex Fridman (00:15:27) So you mentioned Quest Three. That’s coming out. I’ve gotten a chance to try that too. That’s awesome. How’d you pull off the mix? So it’s not just virtual reality, it’s mixed reality.
Mark Zuckerberg (00:15:37) I think it’s going to be the first mainstream mixed reality device. Obviously, we shipped Quest Pro last year, but it was $1,500. And part of what I’m super proud of is we try to innovate not just on pushing the state-of-the-art and delivering new capabilities, but making it so it can be available to everyone. And we have this, and it’s coming out, it’s $500, and in some ways, I think the mixed reality is actually better in Quest Three than what we’re using right now in Quest Pro. And I’m really proud of the team for being able to deliver that kind of an innovation and get it out. But some of this is just software you tune over time and get to be better. Part of it is you put together a product and you figure out, what are the bottlenecks in terms of making it a good experience?
(00:16:26) So we got the resolution for the mixed reality cameras and sensors to be multiple times better in Quest Three, and we just figured that, that made a very big difference when we saw the experience that we were able to put together for Quest Pro. And part of it is also that Qualcomm just came out with their next generation chip set for VR and MR that we worked with them on, on a custom version of it. But that was available this year for Quest Three and it wasn’t available in Quest Pro. So in a way, Quest Three, even though it’s not the Pro product, actually has a stronger chip set in it than the Pro line at a third of the cost. So I’m…
Mark Zuckerberg (00:17:00) …line at a third of the cost. So I’m really excited to get this in people’s hands. It does all the VR stuff that Quest 2 and the others have done too. It does it better because the display is better and the chip is better. So you’ll get better graphics. It’s 40% thinner, so it’s more comfortable as well. But the MR is really the big capability shift. And part of what’s exciting about the whole space right now is this isn’t like smartphones, where companies put out a new smartphone every year, and you can almost barely tell the difference between that and the one the year before it.
(00:17:36) For this, each time we put out a new headset, it has a major new capability. And the big one now is mixed reality. The ability to basically take digital representations of people or objects and superimpose them on the world. And basically, there’s a, one version of this is you’re going to have these augments or holograms and experiences that you can bring into your living room or a meeting space or office.
(00:18:06) Another thing that I just think is going to be a much simpler innovation, is that there are a lot of VR experiences today that don’t need to be fully immersive. And if you’re playing a shooter game or you’re doing a fitness experience, sometimes people get worried about swinging their arms around, like, am I going to hit a lamp or something and am I going to run into something? So having that in mixed reality, actually, it’s just a lot more comfortable for people. You kind of still get the immersion and the 3D experience and you can have an experience that just wouldn’t be possible in the physical world alone. But by being anchored to and being able to see the physical world around you, it just feels so much safer and more secure. And I think a lot of people are really going to enjoy that too. So yeah, I’m really excited to see how people use it. But yeah, Quest 3 coming out later this fall.
Lex Fridman (00:18:53) And I got to experience it with other people sitting around and there’s a lot of furniture. And so you get to see that furniture, you get to see those people, and you get to see those people enjoy the ridiculousness of you swinging your arms. I mean, presumably their friends of yours, even if they make fun of you, there’s a lot of love behind that and I got to experience that. So that’s a really fundamentally different experience than just pure VR with zombies coming out of walls and-
Mark Zuckerberg (00:19:20) Yeah, it’s like someone shooting at you and you hide behind your real couch in order to duck the fire. Yeah.
Lex Fridman (00:19:26) It’s incredible how it’s all integrated, but also subtle stuff, like in a room with no windows, you can add windows to it and you can look outside as the zombies run towards you, but it’s still a nice view outside. And so that’s pulled off by having cameras on the outside of the headset that do the pass through. That technology is incredible to do that on a small headset.
Mark Zuckerberg (00:19:50) Yeah, and it’s not just the cameras. You basically, you need multiple cameras to capture the different angles and sort of the three-dimensional space, and then it’s a pretty complex compute problem, an AI problem to map that to your perspective because the cameras aren’t exactly where your eyes are because no two people’s eyes are, you’re not going to be in exactly the same place. You need to get that to line up and then do that basically in real time and then generate something that kind of feels natural and then superimpose whatever digital objects you want to put there.
(00:20:24) So yeah, it’s a very interesting technical challenge and I think we’ll continue tuning this for the years to come as well. But I’m pretty excited to get this out because I think Quest 3 is going to be the first device like this that millions of people are going to get that’s mixed reality. And it’s only when you have millions of people using something that you start getting the whole developer community really starting to experiment and build stuff because now there are going to be people who actually use it. So I think we got some of that flywheel going with Quest Pro, but I think it’ll really get accelerated once Quest 3 gets out there. So yeah, I’m pretty excited about this one.
Lex Fridman (00:21:01) Plus there’s hand tracking, so you don’t need to have a control, so the cameras aren’t just in the pass through of the entire physical reality around you. It’s also tracking the details of your hands in order to use that for gesture recognition, this kind of stuff.
Mark Zuckerberg (00:21:17) Yeah, we’ve been able to get way further on hand recognition in a shorter period of time than I expected, so that’s been pretty cool. I don’t know, did you see the demo experience that we built around?
Lex Fridman (00:21:29) Piano?
Mark Zuckerberg (00:21:30) Yeah, the piano. Learning to play piano.
Lex Fridman (00:21:32) Yeah, it’s incredible. You’re basically playing piano on a table, and that’s without any controller. And how well it matches physical reality with no latency, and it’s tracking your hands with no latency and it’s tracking all the people around you with no latency. Integrating physical reality and digital reality, obviously that connects exactly to this Codec Avatar, which is in parallel allows us to have ultra realistic copies of ourselves in this mixed reality.
(00:22:06) So it’s all converging towards an incredible digital experience in the Metaverse. To me, obviously I love the intimacy of conversation, so even this is awesome, but do you have other ideas of what this unlocks, of something like Codec Avatar unlocks in terms of applications, in terms of things we’re able to do?
Mark Zuckerberg (00:22:28) Well, there’s what you can do with avatars overall, in terms of superimposing digital objects on the physical world, and then there’s psychologically, what does having photorealistic do? So I think we’re moving towards a world where we’re going to have something that looks like normal glasses, where you can see the physical world, but you’ll also see holograms. And in that world, I think that there are going to be, not too far off, maybe by the end of this decade, we’ll be living in a world where there are as many holograms when you walk into a room as there are physical objects. And it really raises this interesting question about what are… A lot of people have this phrase where they call the physical world the real world.
(00:23:19) And I kind of think increasingly, the physical world is super important, but I actually think the real world is the combination of the physical world and the digital worlds coming together. But until this technology, they were sort of separate. It’s like you access the digital world through a screen and maybe it’s a small screen that you carry around or it’s a bigger screen when you sit down at your desk and strap in for a long session, but they’re kind of fundamentally divorced and disconnected. And I think part of what this technology is going to do is bring those together into a single coherent experience of what the modern real world is, which is, it’s got to be physical because we’re physical beings. So the physical world is always going to be super important.
(00:24:01) But increasingly, I think a lot of the things that we kind of think of can be digital holograms. I mean, any screen that you have can be a hologram, any media, in any book, art. It can basically be just as effective as a hologram, as a physical object. Any game that you’re playing, a board game or any kind of physical game, cards, ping pong, things like that, they’re often a lot better as holograms. Because you could just snap your fingers and instantiate them and have them show up. It’s like you have a ping pong table show up in your living room, but then you can snap your fingers and have it be gone. So that’s super powerful. So I think that it’s actually an amazing thought experiment of like how many physical things we have today that could actually be better as interactive holograms.
(00:24:52) But then beyond that, I think the most important thing obviously is people. So the ability to have these mixed hangouts, whether they’re social or meetings where you show up to a conference room, you’re wearing glasses or a headset in the very near term, but hopefully by over the next five years, glasses or so. And you’re there physically. Some people are there physically, but other people are just there as holograms and it feels like it’s them who are right there.
(00:25:23) And also by the way, another thing that I think is going to be fascinating about being able to blend together the digital and physical worlds in this way, is we’re also going to be able to embody AIs as well. So I think you’ll also have meetings in the future where you’re basically, maybe you’re sitting there physically and then you have a couple of other people who are there as holograms, and then you have Bob, the AI, who’s an engineer on your team who’s helping with things, and he can now be embodied as a realistic avatar as well, and just join the meeting in that way. So I think that that’s going to be pretty compelling as well.
(00:26:03) Okay, so what can you do with photorealistic avatars compared to the more expressive ones that we have today? Well, I think a lot of this actually comes down to acceptance of the technology. And because all of the stuff that we’re doing, I mean, the motion of your eyebrows, the motion of your eyes, the cheeks and all of that, there’s actually no reason why you couldn’t do that on an expressive avatar too. I mean, it wouldn’t look exactly like you, but you can make a cartoon version of yourself and still have it be almost as expressive.
(00:26:38) But I do think that there’s this bridge between the current state of most of our interactions in the physical world and where we’re getting in the future with this kind of hybrid, physical and digital world, where I think it’s going to be a lot easier for people to take some of these experiences seriously with the photorealistic avatars to start. And then I’m actually really curious to see where it goes longer term. I could see a world where people stick to the photorealistic and maybe they modify them to make them a little bit more interesting, but maybe fundamentally, we like photorealistic things.
(00:27:14) But I can also see a world that once people get used to the photorealistic avatars and they get used to these experiences, that I actually think that there could be a world where people actually prefer being able to express themselves in kind of non, ways that aren’t so tied to their physical reality. And so that’s one of the things that I’m really curious about. And I don’t know, in a bunch of our internal experiments on this, one of the things that I thought was psychologically pretty interesting is, people have no issues blending photorealistic stuff and not.
(00:27:50) So for this specific scene that we’re in now, we happen to sort of be in a dark room. I think part of that aesthetic decision I think was based on the way you like to do your podcast, but we’ve done experiences like this, where you have a cartoony background, but photorealistic people who you’re talking to, and people just seem to just think that that is completely normal. It doesn’t bother you, it doesn’t feel like it’s weird.
(00:28:21) Another thing that we have experienced with, is basically you have a photorealistic avatar that you’re talking to, and then right next to them you have an expressive kind of cartoon avatar. And that actually is pretty normal too. It’s not that weird to basically being interacting with different people in different modes like that. So I’m not sure, I think it’ll be an interesting question, to what extent these photorealistic avatars are a key part of just transitioning from being comfortable in the physical world to this kind of new, modern, real world that includes both the digital and physical, or if this is the long-term way that it stays?I think that there are going to be uses for both the expressive and the photorealistic over time. I just don’t know what the balance is going to be.
Lex Fridman (00:29:08) Yeah. It’s a really good, interesting philosophical question, but to me, in the short term, the photorealistic is amazing. To where I would prefer, you said the workroom, but on a beach with a beer, just to see a buddy of mine remotely on a chair next to me, drinking a beer. I mean that, as realistic as possible, is an incredible experience. So I don’t want any fake hats on him. I don’t want any, just chilling with a friend, drinking beer, looking at the ocean, while not being in the same place together. I mean, that experience is just, it’s a fundamentally, it’s just a high quality experience of friendship. Whatever we seek in friendship, it seems to be present there in the same kind of realism I’m seeing right now. This is totally a game changer. So to me, this is, I can see myself sticking with this for a long time.
Mark Zuckerberg (00:30:01) Yeah, and I mean it’s also, it’s novel. And it’s also a technological feat, right? It’s like being able to pull this off, it’s a pretty impressive and I think to some degree, it’s just this kind of awesome experience.

Nature of reality

Lex Fridman (00:30:15) But I’m already, sorry to interrupt, I’m already forgetting that you’re not real. This really-
Mark Zuckerberg (00:30:22) Well, I am real.
Mark Zuckerberg (00:30:23) It’s novel.
Mark Zuckerberg (00:30:24) This is just an avatar version of me.
Lex Fridman (00:30:26) That’s a deep philosophical question. Yes.
Mark Zuckerberg (00:30:29) But here’s some of the… So I put this on this morning and I was like, “All right.” It’s like, okay, my hair is a little shorter in this than my physical hair is right now. I probably need to go get a haircut. And I actually, I did happen to shave this morning, but if I hadn’t, I could still have this photorealistic avatar that is more cleanly shaven, even if I’m a few days in, physically. So I do think that there are going to start to be these subtle questions that seep in where the avatar is realistic in the sense of, this is kind of what you looked like at the time of capture, but it’s not necessarily temporarily accurate to exactly what you look like in this moment. And I think that there are going to end up being a bunch of questions that come from that over time, that I think are going to be fascinating too.
Lex Fridman (00:31:22) You mean just the nature of identity of who we are? You know how people do summer beach body? Where people will be, for the scan, they’ll try to lose some weight and look their best and sexiest with the nice hair and everything like that. It does raise the question of if a lot of people interacting with the digital version of ourselves, who are we really? Are we the entity driving the avatar or are we the avatar?
Mark Zuckerberg (00:31:52) Well, I mean, I think our physical bodies also fluctuate and change over time too. So I think there’s a similar question of which version of that are we? And it’s an interesting identity question because all right, it’s like, I don’t know, it’s like weight fluctuates or things like that. I think most people don’t tend to think of themselves as the… Well, I don’t know. It’s an interesting psychological question. Maybe a lot of people do think about themselves as the kind of worst version, but I think a lot of people probably think about themselves as the best version.
(00:32:26) And then it’s like what you are on a day-to-day basis doesn’t necessarily map to either of those. Yeah, there will definitely be a bunch of social scientists and folks will have to, and psychologists, really, there’s going to be a lot to understand about how our perception of ourselves and others has shifted from this.
Lex Fridman (00:32:51) Well, this might be a bit of a complicated and a dark question, but one of the first feelings I had experiencing this is I would love to talk to loved ones. And the next question I have is I would love to talk to people who are no longer here that are loved ones. So if you look into the future, is that something you think about? Who people who pass away, but they can still exist in the metaverse, you could still have, talk to your father, talk to your grandfather and grandmother and a mother once they pass away. The power of that experience is one of the first things my mind jumped because it’s like, this is so real.
Mark Zuckerberg (00:33:30) Yeah, I think that there are a lot of norms and things that people have to figure out around that. There’s probably some balance, where if someone has lost a loved one and is grieving, there may be ways in which being able to interact or relive certain memories could be helpful. But then there’s also probably an extent to which it could become unhealthy. And I mean, I’m not an expert in that, so I think we’d have to study that and understand it in more detail. We have a fair amount of experience-
Mark Zuckerberg (00:34:00) … understand it in more detail. We have a fair amount of experience with how to handle death and identity, and people’s digital content through social media already, unfortunately. Unfortunately people who use our services die every day and their families often want to have access to their profiles and we have whole protocols that we go through where there are certain parts of it that we try to memorialize so that way the family can get access to it so that the account doesn’t just go away immediately. But then there are other things that are important, private things that person has. We’re not going to give the family access to someone’s messages, for example.

AI in the Metaverse

(00:34:42) So I think that there’s some best practices, I think from the current digital world that will carry over. But I think that this will enable some different things. Another version of this is how this intersects with AIs because one of the things that we’re really focused on is we want the world to evolve in a way where there isn’t a single AI super intelligence, but where a lot of people are empowered by having AI tools to do their jobs and make their lives better.
(00:35:19) And if you’re a creator and if you run a podcast like you do, then you have a big community of people who are super interested to talk to you. I know you’d love to cultivate that community and you interact with them online outside of the podcast as well. But I mean, there’s way more demand both to interact with you, and I’m sure you’d love to interact with the community more, but you just are limited by the number of hours in the day.
(00:35:46) So at some point, I think making it so that you could build an AI version of yourself that could interact with people not after you die, but while you’re here to help people fulfill this desire to interact with you and your desire to build a community. And there’s a lot of interesting questions around that, and obviously, it’s not just in the metaverse. I think we’d want to make that work across all the messaging platforms, WhatsApp, and Messenger, and Instagram Direct. But there’s certainly a version of that where if you could have an avatar version of yourself in the metaverse that people can interact with, and you could define that sort of an AI version where people know that they’re interacting with an AI, that it’s not the physical version of you, but maybe that AI, even if they know it’s an AI, is the next best thing because they’re probably not going to necessarily all get to interact with you directly.
(00:36:45) I think that could be a really compelling experience. There’s a lot of things that we need to get right about it that we’re not ready to release the version that a creator can build a version of themselves yet, but we’re starting to experiment with it in terms of releasing a number of AIs that people can interact with in different ways. I think that that is also just going to be a very powerful set of capabilities that people have over time.
Lex Fridman (00:37:13) So you’ve made major strides in developing these early AI personalities with the idea where you can talk to them across the Meta apps and have interesting, unique kind of conversations. Can you describe your vision there and these early strides and what are some technical challenges there?
Mark Zuckerberg (00:37:34) Yeah. So a lot of the vision comes from this idea that… I don’t think we necessarily want there to be one big super intelligence. We want to empower everyone to both have more fun, accomplish their business goals, just everything that they’re trying to do. We don’t tend to have one person that we work with on everything, and I don’t think in the future we’re going to have one AI that we work with. I think you’re going to want a variety of these. So there are a bunch of different uses. Some will be more assistant oriented. There’s a sort of the plain and simple one that we are building is called just Meta AI. It’s simple. You can chat with it in any of your Threads. It doesn’t have a face.
(00:38:22) It’s just more vanilla and neutral and factual, but it can help you with a bunch of stuff. Then there are a bunch of cases that are more business oriented. So let’s say you want to contact a small business. Similarly, that business probably doesn’t want to have to staff someone to man the phones, and you probably don’t want to wait on the phone to talk to someone. But having someone who you can just talk to in a natural way who can help you if you’re having an issue with a product or if you want to make a reservation or if you want to buy something online, having the ability to do that and have a natural conversation rather than navigate some website or have to call someone and wait on hold think is going to be really good both for the businesses and for normal people who want to interact with businesses.
(00:39:11) So I think stuff like that makes sense. Then there are going to be a bunch of use cases that I think are just fun. So I think people are going to… I think there will be AIs that I can tell jokes, so you can put them into chat thread with friends. I think a lot of this, because we’re like a social company. I mean we’re fundamentally around helping people connect in different ways. Part of what I’m excited about is how do you enable these kind of AIs to facilitate connection between two people or more, put them in a group chat, make the group chat more interesting around whatever your interests are, sports, fashion, trivia.
Lex Fridman (00:39:51) Video games. I love the idea of playing. I think you mentioned Baldur’s Gate, an incredible game. Just having an AI that you play together with. I mean, that seems like a small thing, but it could deeply enrich the gaming experience.
Mark Zuckerberg (00:40:08) I do think that AI will make the NPCs a lot better in games too. So that’s a separate thing that I’m pretty excited about. I mean, one of the AIs that we’ve built that just in our internal testing people have loved the most is an adventure text-based like a dungeon master.
Lex Fridman (00:40:30) Nice.
Mark Zuckerberg (00:40:31) I think, part of what has been fun, and we talked about this a bit, but we’ve gotten some real cultural figures to play a bunch of these folks and be the embodiment in the avatar of them. So Snoop Dogg is the dungeon master, which I think is just hilarious.
Lex Fridman (00:40:48) Yes. In terms of the next steps of, you mentioned Snoop, to create a Snoop AI, so basically AI personality replica a copy… Or not a copy, maybe inspired by Snoop, what are some of the technical challenges of that? What does that experience look like for Snoop to be able to create that AI?
Mark Zuckerberg (00:41:11) So starting off, creating new personas is easier because it doesn’t need to stick exactly to what that physical person would want, how they’d want to be represented. It’s like it’s just a new character that we created. So Snoop in that case, he’s basically an actor. He’s playing the Dungeon Master, but it’s not Snoop Dogg, it’s whoever the dungeon master is. If you want to actually make it so that you have an AI embodying a real creator, there’s a whole set of things that you need to do to make sure that that AI is not going to say things that the creator doesn’t want and that the AI is going to know things and be able to represent things in the way that the creator would want, the way that the creator would know.
(00:42:06) So I think that it’s less of a question around having the avatar express them. I mean that I think where it’s like, well, we have our V1 of that that we’ll release soon after Connect. But that’ll get better over time. But a lot of this is really just about continuing to make the models for these AIs that they’re just more and more, I don’t know, you could say reliable or predictable in terms of what they’ll communicate so that way when you want to create the Lex assistant AI that your community can talk to. You don’t program them like normal computers, you’re training them. They’re AI models, not normal computer programs, but you want to get it to be predictable enough so that way you can set some parameters for it.
(00:42:59) And even if it isn’t perfect all the time, you want it to generally be able to stay within those bounds. So that’s a lot of what I think we need to nail for the creators, and that’s why that one is actually a much harder problem, I think, than starting with new characters that you’re creating from scratch. So that one I think will probably start releasing sometime next year. Not this year, but experimenting with existing characters and the assistant, and games, and a bunch of different personalities and experimenting with some small businesses. I think that that stuff we’ll be ready to do this year. And we’re rolling it out basically right after Connect.
Lex Fridman (00:43:42) Yeah. I’m deeply entertained by the possibility of me sitting down with myself and saying, “Hey, man, you need to stop the dad jokes or whatever.”
Mark Zuckerberg (00:43:52) I think the idea of a podcast between you and AI assistant Lex podcast.
Lex Fridman (00:43:59) I mean, there is just even the experience of an avatar, being able to freeze yourself like basically first mimic yourself, so everything you do, you get to see yourself do it. That’s a surreal experience. That feels like if I was an ape looking in a mirror for the first time, realizing, “Oh, that’s you.” But then freezing that and being able to look around like I’m looking at you, I don’t know how to put it into words, but it just feels like a fundamentally new experience. I’m seeing maybe color for the first time. I’m experiencing a new way of seeing the world for the first time because it’s physical reality, but it’s digital. And realizing that that’s possible, it’s blowing my mind. It’s just really exciting.
(00:44:50) I lived most of my life before the internet and experiencing the internet and experiencing voice communication, video communication. You think like, “Well, there’s a ceiling to this, but this is making me feel like there might not be, there might be that blend of physical reality and digital reality. That’s actually what the future is.”
Mark Zuckerberg (00:45:12) Yeah, I think so.
Lex Fridman (00:45:13) It’s a weird experience. It feels like the early days of a totally new way of living, and there’s a lot of people that kind of complain, “Well, the internet, that’s not reality. You need to turn all that off and go in nature.” But this feels like this will make those people happy. I feel like, because it feels real, the flaws in everything.
Mark Zuckerberg (00:45:37) Yeah. Well, I mean, a big part of how we’re trying to design these new computing products is that they should be physical, but I think that’s a big part of the issue with computers and TVs and even phones is like, “Yeah, maybe you can interact with them in different places.” But they’re fundamentally like you’re sitting, you’re still. I mean, people are just not meant to be that way. I mean, I think you and I have this shared passion for sports and martial arts and doing stuff like that. We’re just moving around. It’s so much of what makes us people is like, you move around. You’re not just like a brain and a tank. It’s where the human experience is a physical one.
(00:46:17) So it’s not just about having the immersive expression of the digital world, it’s about being able to really natively bring that together. I do really think that the real world is this mix of the physical and the digital. There’s too much digital at this point for it to just be siloed to a small screen, but the physical is too important. So you don’t want to just sit down all day long at a desk. I do think that this is the future. This is, I think the kind of philosophical way that I would want the world to work in the future is a much more coherently, blended, physical and digital world.
Lex Fridman (00:46:56) There might be some difficult philosophical and ethical questions we have to figure out as a society. Maybe you can comment on this. So the metaverse seems to enable, sort of unlock a lot of experiences that we don’t have in the physical world. And the question is what is and isn’t allowed in the metaverse? In video games, we allow all kinds of crazy stuff. And in physical reality, a lot of that is illegal. So where’s that line? Where’s that gray area between video game and physical reality? Do you have a sense of that?
Mark Zuckerberg (00:47:37) I mean, there are content policies and things like that in terms of what people are allowed to create, but I mean, a lot of the rules around physical… I think we try to have a society that is as free as possible, meaning that people can do as much of what they want unless you’re going to do damage to other people and infringe on their rights. And the idea of damage is somewhat different in a digital environment.
(00:48:02) I mean, when I get into some world with my friends, the first thing we start doing is shooting each other, which obviously we would not do in the physical world because you’d hurt each other. But in a game, it’s just fun. And even in the lobby of a game, it’s not even bearing on the game, it’s just kind of a funny sort of humorous thing to do. So it’s like, is that problematic? I don’t think so because fundamentally you’re not causing harm in that world. So I think that part of the question that I think we need to figure out is what are the ways where things could have been harmful in the physical world that we’ll now be freed from that? And therefore there should be fewer restrictions in the digital world.
(00:48:48) And then there might be new ways in which there could be harm in the digital world that there weren’t the case before. So there’s more anonymity. It’s when you show up to a restaurant or something, it’s like all the norms where you pay the bill at the end. It’s because you have one identity. And if you stiff them, then life is a repeat game and that’s not going to work out well for you. But in a digital world where you can be anonymous and show up in different ways, I think the incentive to act like a good citizen can be a lot less, and that causes a lot of issues and toxic behavior. So that needs to get sorted out.
(00:49:28) So I think in terms of what is allowed, I think you want to just look at what are the damages, but then there’s also other things that are not related to harm, less about what should be allowed and more about what will be possible that are more about the laws of physics. It’s like if you wanted to travel to see me in person, you’d have to get on a plane, and that would take a few hours to get here. Whereas we could just jump in a conference room and put on these headsets and we’re basically teleported into a space where it feels like we’re together.
(00:50:04) So that’s a very novel experience that it breaks down some things that previously would’ve defied the laws of physics for what it would take to get together. And I think that that will create a lot of new opportunities. One of the things that I’m curious about is there are all these debates right now about remote work or people being together. I think this gets us a lot closer to being able to work physically in different places, but actually have it feel like we’re together. So I think that the dream is that people will one day be able to just work wherever they want, but we’ll have all the same opportunities because you’ll be able to feel like you’re physically together. I think we’re not there today with just video conferencing and the basic technologies that we have, but I think part of the idea is that with something like this, over time, you could get closer to that and that would open up a lot of opportunities, right? Because then people could live physically where they want while still being able to get the benefits of being physically or feeling like you’re together.
Mark Zuckerberg (00:51:00) … to get the benefits of being physically or feeling like you’re together with people at work, all the ways that that helps to build more culture and build better relationships and build trust, which I think are real issues that if you’re not seeing people in person ever. So yeah, I don’t know. I think it’s very hard from first principles to think about all the implications of a technology like this and all the good and the things that you need to mitigate. So you try to do your best to envision what things are going to be like and accentuate the things that they’re going to be awesome and hopefully mitigate some of the downside things. But the reality is that we’re going to be building this out one year at a time. It’s going to take a while, so we’re going to just get to see how it evolves and what developers and different folks do with it.

Large language models

Lex Fridman (00:51:52) If you could comment, this might be a bit of a very specific technical question, but Llama 2 is incredible. You’ve released it recently. There’s already been a lot of exciting developments around it. What’s your sense about its release and is there a Llama 3 in the future?
Mark Zuckerberg (00:52:15) Yeah, I mean, I think on the last podcast that we did together, we were talking about the debate that we were having around open sourcing Llama 2. And I’m glad that we did. I think at this point there’s the value of open sourcing, a foundation model like Llama 2. It’s significantly greater than the risks in my view. I mean, we spent a lot of time, took a very rigorous assessment of that and red teaming it. But I’m very glad that we released Llama 2. I think the reception has been… It’s just been really exciting to see how excited people have been about it. It’s gotten way more downloads and usage than I would’ve even expected, and I was pretty optimistic about it. So that’s been great.
(00:53:05) Llama 3, I mean, there’s always another model that we’re training. So for right now, we train Llama 2 and we released it as an open source model. And right now the priority is building that into a bunch of the consumer products, all the different AIs and a bunch of different products that we’re basically building as consumer products. Because Llama 2 by itself, it’s not a consumer product, right? It’s more of a piece of infrastructure that people could build things with.
(00:53:36) So that’s been the big priority, is continuing to fine tune and just get Llama 2 and the branches that we’ve built off of it ready for consumer products that hopefully hundreds of millions of people will enjoy using those products in billions one day. But yeah, I mean we’re also working on the future foundation models. I don’t have anything new or news on that. I don’t know exactly when it’s going to be ready. I think just like we had a debate around Llama 2 and open sourcing it, I think we’ll need to have a similar debate and process to red team this and make sure that this is safe. And my hope is that we’ll be able to open source this next version when it’s ready too. But we’re not close to doing that this month. I mean, that’s a thing that we’re still somewhat early and working on.
Lex Fridman (00:54:37) Well, in general, thank you so much for open sourcing Llama 2 and for being transparent about all the exciting developments around AI. I feel like that’s contributing to a really awesome conversation about where we go with AI. And obviously, it’s really interesting to see all the same kind of technology integrated into these personalized AI systems with the AI personas, which I think when you put in people’s hands and they get to have conversations with these AI personas, you get to see interesting failure cases where the things are dumb or they go into weird directions. And we get to learn as a society together what’s too far, what’s interesting, what’s fun, how much personalization is good, how much generic is good. And we get to learn all of this. And you probably don’t know this yourself. We have to all figure it out by using it, right?
Mark Zuckerberg (00:55:31) Yeah, I mean part of what we’re trying to do with the initial AI’s launch is having a diversity of different use cases just so that people can try different things because I don’t know what’s going to work. I mean, are people going to like playing in the tech-based adventure games or are they going to like having a comedian who can add jokes to threads or they can want to interact with historical figures? We made one of Jane Austin and one of Marcus Aurelius, and I’m curious to see how that goes.
Lex Fridman (00:56:03) I’m excited for both. Aa a big fan I’m excited for both. I have conversations with them. And I am also excited to see, the internet, I don’t know if you heard, can get kind of weird and I applaud them for it. So-
Mark Zuckerberg (00:56:18) I’ve heard that, yeah.
Lex Fridman (00:56:19) Yeah. So it’d be nice to see how weird they take it, what kind of memes are generated from this. And I think all of it is, especially in these early stages of development as we progress towards AGI, it’s good to learn by playing with those systems and interacting with them at a large scale like you said.
Mark Zuckerberg (00:56:38) Yeah, totally. I mean, that’s why we’re starting out with a set. And then we’re also working on this platform that we call AI Studio that’s going to make it so that over time anyone will be able to create one of these AI almost like they create any other UGC content across the platform. So I’m excited about that. I think that to some degree we’re not going to see the full potential of this until you just have the full creativity of the whole community being able to build stuff, but there’s a lot of stuff that we need to get. So I’m excited to take this in stages. I don’t think anyone out there is really doing what we’re doing here. I think that there are people who are doing fictional or consumer-oriented character type stuff, but the extent to which we’re building it out with the avatars and expressiveness and making it so that they can interact across all of the different apps and they’ll have profiles and we’ll be able to engage people on Instagram and Facebook, I think it’s going to be really fun.

Future of humanity

Lex Fridman (00:57:49) Well, we’re talking about AI, but I’m still blown away this entire time that I’m talking to Mark Zuckerberg. And you’re not here, but you feel like you’re here. I’ve done quite a few intimate conversations with people alone in a room, and this feels like that. So I keep forgetting for long stretches of time that we’re not in the same room. And for me to imagine a future where I can with a snap of a finger do that with anyone in my life, the way we can just call right now and have this kind of shallow 2D experience, to have this experience like we’re sitting next to each other is like… I don’t think we can even imagine how that changes things where you can immediately have intimate one-on-one conversations with anyone. In a way, we might not even predict change civilization.
Mark Zuckerberg (00:58:44) Well, I mean this is a lot of the thesis behind the whole Metaverse, is giving people the ability to feel like you’re present with someone. I mean, this is the main thing I talk about all the time, but I do think that there’s a lot to process about it. I mean, from my perspective, I’m definitely here. We’re just not physically in the same place. You’re not talking to an AI. So I think the thing that’s novel is the ability to convey through technology a sense of almost physical presence. So the thing that is not physically real is us being in the same physical place, but everything else is. And I think that that gets to this somewhat philosophical question about what is the nature of the modern real world? And I just think that it really is this combination of physical world and the presence that we feel, but also being able to combine that with this increasingly rich and powerful and capable digital world that we have and all of the innovation that’s getting created there.
(00:59:52) So I think it’s super exciting because I mean, the digital world is just increasing in its capability and our ability to do awesome things, but the physical world is so profound, and that’s a lot of what makes us human is that we’re physical beings. So I don’t think we want to run away from that and just spend all day on a screen. It’s one of the reasons why I care so much about helping to shape and accelerate these future computing platforms. I just think this is so powerful. And even though the current version of this is like you’re wearing a headset, I just think this is going to be by far the most human and social computing platform that has ever existed. And that’s what makes me excited.
Lex Fridman (01:00:36) Yeah, I think just to linger on this kind of changing nature of reality of what is real, maybe shifting it towards the sort of consciousness. So what is real is the subjective experience of a thing that makes it feel real versus necessarily being in the same physical space, because It feels like we’re in the same physical space. And that the conscious experience of it, that’s probably what is real. Not like that the space time, the physics of it. You’re basically breaking physics and focusing on the consciousness. That’s what’s real. Just whatever’s going on inside my head.
Mark Zuckerberg (01:01:17) But there are a lot of social and psychological things that go along with that experience that was previously only physical presence, right? I think that there’s an intimacy, a trust. There’s a level of communication because so much of communication is nonverbal and is based on expressions that you’re sharing with someone when you’re in this kind of environment. And before, those things would’ve only been possible had I gotten on a plane and flown to Austin and sat physically with you in the same place. So I think we’re basically short cutting those laws of physics and delivering the social and psychological benefits of being able to be present and feel like you’re there with another person, which I are real benefits to anyone in the world.
(01:02:10) Like you said, I think that is going to be a very profound thing. A lot of that is that’s the promise of the Metaverse and why I think that that’s the next frontier for what we’re working on. I started working on social networks when they were primarily text, where the first version of Facebook, your profile, you had one photo and the rest of it was lists of things that you were interested in. And then we kind of went through the period where we were doing photos. And now we’re kind of in the period where most of the content is video, but there’s a clear trend where over time the way that we want to express ourselves and kind of get insight and content about the world around us gets increasingly just richer and more vivid.
(01:02:57) And I think the ability to be immersed and feel present with the people around you or the people who you care about is, from my perspective, clearly the next frontier. It just so happens that it’s incredibly technologically difficult. It requires building up these new computing platforms and completely new software stacks to deliver that, but I kind of feel like that’s what we’re here to do as a company.
Lex Fridman (01:03:21) Well, I really love the connection you have through conversation. And so for me, this photo realism is really, really exciting. I’m really excited for this future and thank you for building it. Thanks to you and thanks to the amazing Meta teams that I’ve met, the engineers and just everybody I’ve met here. Thank you for helping to build this future. And thank you, Mark, for talking to me inside the Metaverse. This is blowing my mind. I can’t quite express. I would love to measure my heart rate this whole time. It would be hilarious if you’re actually sitting in a beach right now.
Mark Zuckerberg (01:04:00) I’m not. I’m in a conference room.
Lex Fridman (01:04:02) Okay. Well, I’m at a beach and not wearing any pants. I’m really sorry about that for anyone else who’s watching me in physical space. Anyway, thank you so much for talking today. This really blew my mind. It’s one of the most incredible experiences in my life, so thank you for giving that to me.
Mark Zuckerberg (01:04:17) Awesome. Awesome. Glad you got to check it out. And it’s always fun to talk. All right, I’ll catch you soon. See you.
Lex Fridman (01:04:23) See you later. This is so, so amazing, man. This is so-