The Science of Making Truthful AI

AZEEM AZHAR: Hi, I’m Azeem Azhar, founder of Exponential View and your host on the Exponential View podcast. When chat GPT launched back in November, 2022, it became the fastest growing consumer product ever, and it catapulted artificial intelligence to the top of business priorities. It’s a vivid reminder of the transformative potential of the technology. And like many of you, I’ve woven generative AI into the fabric of my daily work. It’s indispensable for my research and analysis, and I know there’s a sense of urgency out there. In my conversations with industry leaders, the common thread is that urgency. How do they bring clarity to this fast moving noisy arena? What is real and what isn’t? What, in short, matters? If you follow my newsletter, Exponential View, you’ll know that we’ve done a lot of work in the past year equipping our members to understand the strengths and limitations of this technology and how it might progress. We’ve helped them understand how they can apply it to their careers and to their teams and what it means for their organizations. And that’s what we’re going to do here on this podcast. Once a week, I’ll bring you a conversation from the frontiers of AI to help you cut through that noise. We record each conversation in depth for 60 to 90 minutes, but you’ll hear the most vital parts distilled for clarity and impact on this podcast. If you want to listen to the full unedited conversations as soon as they’re available, head to exponentialview.co. In today’s conversation, I speak with Richard Socher. He’s an AI researcher and founder. As a scientist, his papers have been cited more than 150,000 times. He’s also been a successful entrepreneur. He built one company, which was acquired by Salesforce, and then he became chief scientist at that behemoth software firm. And in 2020, he founded You.com. It’s a chat search assistant. It’s one of what could be a new category of consumer internet products. So, we discussed the research behind his startup and his journey to founding it, but I start our conversation with a more expansive question, one which he’s qualified to explore that is, what is intelligence? We recorded the conversation for over an hour. My listeners here have an edited shorter version. If you want to listen to the whole conversation, head over to www.exponentialview.co. Enjoy. Richard, welcome to the show.

RICHARD SOCHER: Thanks for having me, Azeem.

AZEEM AZHAR: Well, 168,000 citations. I mean, that’s quite something. I mean, is your family proud of that?

RICHARD SOCHER: Actually, it’s funny that you say that, but yes, my dad especially is very proud of that. He even brought it up at his wedding speech on my wedding.

AZEEM AZHAR: Right, right. His wedding speech at your wedding?

RICHARD SOCHER: Correct. Yeah.

AZEEM AZHAR: Yeah. Wonderful, wonderful. Just to really put the pressure on your partner as to the expectations that they have to achieve, right?

RICHARD SOCHER: Mostly for fun, but he was also an academic for a few years and understands that that is not very common.

AZEEM AZHAR: That stems mostly, I guess, from the paper that in a way started this all off, which was more than a decade ago, the ImageNet paper with Fei-Fei Li, who’s also, of course, been one of my guests. I’ve spoken to her a few times and had that critical bit of kindling that I suppose kicked off the deep learning wave. Do you think that’s right? If we look historically, is that reasonable place to start?

RICHARD SOCHER: Yeah. So I think there, actually, there’s one event that happened even before ImageNet, and that was George Dahl and Geoff Hinton actually working on speech recognition and neural nets. There’s still some probabilistic pre-training models in there, but that was sort of the first time where people say, wow, if we have more training data, now speech recognition actually is best done with a neural network. And then the ImageNet wave came. Of course, ImageNet was the dataset, Alex Krizhevsky, Hinton again, and Ilya Sutskever actually using that dataset and training a large convolutional neural net. That was the watershed moment, I think, for most to understand, wow. It was enabled by having this dataset. So it’s a necessary condition for that success. But of course, the model is absolutely crucial. And then when you look at my second most highly cited paper, it’s a word vector paper and word vectors were kind of the necessary ingredient to get natural language processing into the neural network field as well, because speech is somewhat straightforwardly put into a neural network. Images are very easy to put into a neural network. Neural networks want numbers as inputs. A function F of X equals X squared or something. X is the input and you get a function out like X square. Like a neural net is even much more complex function and not just one number, but often thousands of numbers or millions of numbers that are get fed into the neural network and then you get some output. And so words aren’t necessarily a list of numbers, and so having a word as a vector was a very crucial moment for… And of course, there are other ways you can put words into vectors, word to vectors, the other famous word vector. But those two papers kind of helped everyone to start using neural nets for natural language processing too. And that’s sort of the most of the rest of my citations.

AZEEM AZHAR: Yeah, I mean they’re pretty foundational. Now, there’s a key phrase that you said, which was if we had enough data… And we’re going to return to the question of data during our conversation, but maybe let’s zoom out a little bit. We talk a lot about the artificial intelligence wave, the artificial intelligence boom, .ai is the hottest domain name that you can find these days, but we often skirt over what we mean by the I in that, the intelligence. So what is intelligence?

RICHARD SOCHER: That is a great question. I’ll try to keep it short because we could talk about that for hours. But I think there are different ways of looking at intelligence. And the most obvious one that comes natural to a lot of people is you look at the brain and then we can look at what can the brain do? And one obvious thing is it can move and it can help, well, the brain itself not, but it can trigger movement in physical bodies. And so a lot of people look at robotics as an artificial version of motor intelligence that we see in animals and humans. Then the brain also helps us understand visual input. So we look at computer vision in AI, and you look at visual intelligence, the visual cortex, which takes up quite a bit of parts of the brain, and then we can look at natural language. And that of course, I think is the most interesting manifestation of intelligence. And it is certainly the one that sets us apart the most from animals. Animals have, obviously in some cases, even better motor control. Chimpanzees have very fast motor control and great visual inputs, but natural language is kind of what we have the most of, much more sophisticated than any other animal as far as we know. Maybe there’s some questions around whales, but it’s not helpful to not have opposable thumbs to create writing and so on. Writing, and that’s the interesting thing about language. Eventually, language enables you to have collective intelligence too, and historical intelligence and memory. You can now learn from lessons of other people you’ve never met or seen or could talk to and no one would remember them, but you read their text from thousands of years ago, and language also connects thought. An. Then you can talk about visual inputs, you can talk about motor inputs and outputs. And so I think language is the biggest one. And then of course there are sort of these even higher levels of intelligence like planning and logical reasoning and mathematical reasoning. And the field of AI has often made the mistake of thinking, well, if we solve those hardest bits, the rest will be easy. But it turns out, it was actually opposite, like logic, math and so on…

AZEEM AZHAR: Now is easier.

RICHARD SOCHER: … is already easier for a computer.

AZEEM AZHAR: Yeah. It’s interesting, but in your discussion of it there, it reminds me a little bit of Justice Potter Stewart discussing obscenity and obscene material, and he said, “I’ll know it when I see it.” And from an engineering perspective… Well, let’s say this. From a human perspective, from a perspective of us being sort of squishy biological things with feelings and consideration and empathy, it’s quite nice to have this idea that there are lots of different types of intelligence and we can’t quite put our finger on it. I mean, it makes the world more livable and we can be more accepting of others. It’s kind of a disaster from an engineering perspective though. It’s like build this thing, it kind of is a bit like this, but it might be a little bit like this, but you know what I mean? I mean, there’s any amount of terrible enterprise software that has been built from that kind of specification. So how do you match what you described with what an engineer needs to try to build an intelligence? Or is it that in reality when we take away this marketing gloss, what we’re building is kind of great software, but we’re not really building intelligences?

RICHARD SOCHER: It’s a great question and it also is connected to the second bit of my previous answer, which is I think most of the time, we looked at human intelligence and then we tried to replicate that in artificial intelligence. But of course, AI can do things that no human has ever evolved to be able to do. And so we can take scientific data, we can take protein sequences, we can take weather data from millions and millions of samples and then try to do better weather prediction than a human could ever do, better amino acid, protein prediction than any human could ever do. And so AI, I think, and that’s kind of the interesting, where the analogy breaks, we can understand the concept of intelligence without having to replicate it exactly the way a human does. Humans have all this evolutionary baggage of zero-sum games of having to hunt and mate, and there’s always competition. AI doesn’t have to evolve in that pattern. And what’s interesting is in recent years, and that started around 2010 with some my neural network papers in natural language processing, but also in other places like computer vision, of course, earlier and speech even before that. But we don’t ask the programmer anymore to try to replicate either the brain, but also not even their own expertise about the particular problem. So an early example in NLP from my, I think, third or fourth most cited paper was sentiment analysis. In the past, sentiment analysis was done by having linguists and experts sit there and say, well, I know a lot of positive words, amazing, awesome. I know [inaudible 00:11:49].

AZEEM AZHAR: So sentiment analysis is essentially what marketing firms do to figure out whether a brand is going up in people’s imagination or not. They see a tweet about a brand and they analyze it and they say, this person who said Pepsi cola is so bad, that’s actually a positive thing because bad means good in that cohort. And so sentiments analysis was often seen as quite tricky and relied on cultural knowledge and expertise from linguists in a sense.

RICHARD SOCHER: Exactly. And it’s used in lots of places. People use it for algorithmic trading. In fact, they’re fun stories where whenever Anne Hathaway started a movie, she won an Oscar, people said that they loved her acting, and then the stocks for Berkshire Hathaway went up multiple times after Anne Hathaway movies come out. And we call it entity disambiguation, just a different Hathaway.

AZEEM AZHAR: I have to put my hand up and take some responsibility for that because about 18 years ago when I was at Reuters, one of my teams developed and launched the first algorithmic newsfeed for hedge funds to automatically train on sentiment signal and so on.

RICHARD SOCHER: You may have actually been in the thick of that one. Yeah, that’s funny. Yeah, so sentiment analysis, people used to say, oh, now there’s negation, so it’s not good. So that’s a feature. And then the “AI,” quote-unquote, would just weight those human design features. And then eventually they would realize, man, there’s a lot of complexity here. This movie doesn’t care about cleverness, wit or any other kind of intelligent humor. It’s like, ah, well, there’s a lot of positive words in there. There’s some negation there, but the negation actually negates everything positive in that sentence. And so we created the largest sentiment data set, and then that allowed us to also show that a neural network outperformed every other traditional AI model, our machine learning model as we call it back then, by a good margin. And so that is what’s very important. Now, as a developer, you don’t think about your own skills and logic that much anymore. You think about what does the neural network need? How do I clean my data, label my data, think about distribution shift over time because there are new things that come up that there weren’t in the data before. And so you then try to just give those as training data to the neural net and you let it figure out all the complexities and details and logic of that domain.

AZEEM AZHAR: So, one of the challenges with that approach is that there are clearly things that humans do which are not reliant on their empirical experience of things that they’ve observed. So mathematical reasoning is one great example. I mean, if mathematicians were empiricists, we would still be at pythagoras’s theorem. We could say goodbye to topologies and manifolds and all these other things. But the other challenge is that we can see, in the natural world, behaviors, which are, what we’d call in machine learning, zero-shots. If you see a baby ibex running vertically upper cliff away from a predator, it doesn’t get a chance to run that picture in a million training cycles over 10 epochs or whatever it is. So how do we square the fact that we can see things that in this Potter Stewart-esque definition of intelligence of knowing it when we see it, where it’s not done from training and it’s not done from thousands of trial runs.

RICHARD SOCHER: I think we’re conflating two things here that are actually very interesting, which is I think it’s clear that a lot of animals have what I would call biological genetic training and have basically a set of weights that they’re fascinatingly born with. You can go even to, I think, horses for instance, they plop out and they just start walking away after a few minutes, versus humans have to figure out a lot of stuff. And there’re very cute pictures of babies trying to give a thumbs up and they’re like, I think this is it. And then you get the feedback from the parents and they’re like, yes, thumbs up. Those were the right fingers. And so I think there is actually a ton of training and learning, and then biology figured out a way to store that learning in a genetic sequence such that when that brain gets instantiated and evolves in the womb, then it already has a set of knowledge. And in some ways, we’re doing a little bit of that in that some of our models are made to be able to ingest images, for instance, versus sound versus text. And so that there’s some sort of high level architecture that makes it better, for instance, to deal with time sequences and different lengths of those time sequences. So there’s a little bit of architecture learning also that humans are going through right now. And we actually tried in the field to do a little bit of automated architecture search for a while, but it never really fully took. Humans are still better at finding the best neural network architecture than some other AI models.

AZEEM AZHAR: If we come to large language models, which is the state of where we are today, it’s what’s got people excited, it’s propelling these decabillion-dollar company valuations. Are you surprised with what large language models can seemingly achieve today compared to where you thought they might be, say, two or three years ago? I mean, have you been surprised to the upside or to the downside?

RICHARD SOCHER: A little bit, but maybe not quite as much as most other people. The bit where, of course, it’s amazing is how much they can abstract away knowledge and do things. And we’ve seen inklings of that before, even in word vectors where all of a sudden you could do the famous example of king minus man plus woman goes to queen. And that was like, oh, we never taught it that, but it kind of learned it in just the word vectors. And then, we worked on contextual vectors that would put a whole sentence in there, and you had similar interesting patterns you could see there. And then, my dream had always been to build a single model for all of natural language processing. And so in 2018, we invented decaNLP, and part of that was that we invented prompt engineering. My TED Talk, just got released about that. And so basically with prompt engineering, the idea was we would have one model. In the past, just to understand why that is so interesting. In the past, we would often say, all right, you want to do sentiment analysis, we train a sentiment analysis model. You want to do translation, we’ll train a translation model. You want to do summarization, we’ll do a summarization model. And we’re like, well, what if you just had a piece of text and you just asked a question about that text, and that question could be what’s a sentiment, what’s a translation into German, what’s the summary? And then you just basically can train a single model. You pre-train the word vectors, the contextual vectors, the whole answer decoder. And that paper actually famously got rejected from ICLR very publicly, but motivated a few others, including folks at OpenAI who, at the time, were still mostly working on hand gesture recognition and Dota gameplay and so on. But in their GPT-2 paper, they cited Bryan McCann et al. and decaNLP, and say, look, these guys were able to build a single model for all of these different problems, and they called it not a question eventually, but a prompt. And so that 2018 paper motivated them to push also on that single model for all of NLP, but it also made us think, well, clearly we should build a better answer engine, and then ultimately, when you use a search engine, what you actually want is to get an answer, not a list of 10 links.

AZEEM AZHAR: Not a list of blue links, right? Yeah. But I want to come to something that you said there, which I think is worth digging into. You said that, I may be paraphrasing here, but these networks and the more basic technologies like word vectors and embeddings, were able to see relationships that weren’t explicitly programmed in. And I think we’ve all had that experience in the sense that if you use one of the large language models, I mean, ChatGPT is a big one that people are most familiar with, you can get it to analogize, you can get it to draw analogies in the same way that you can say, queen is to king as woman is to what? And it’ll say, man, and I think that was the famous embeddings paper from a few years ago. And the question is whether that’s actually new knowledge, or whether that isn’t just data that is in the dataset. And I suppose the way to think about this is if I have 22 soccer players and I measure their heights, their heights is in the dataset, but the mean, the average of their heights is not in the dataset. Do we then say to extract the mean is finding new information that wasn’t in the dataset, or is it just like a mathematical property of the data? Because I kind of feel that some of the examples that we talk about are actually just mathematical properties. It’s like… I don’t know, like it’s a co-sign similarity or a distance in this multidimensional space. And these two things approximate to each other. It’s not new knowledge, it’s like a literally a mathematical reality.

RICHARD SOCHER: So in some ways, yes, it is not… I would say, it is in the “data,” quote-unquote, but more importantly, a big misconception that people have is that these models can only interpret. I have a point X here, another point Y here. And all the model can do is find things on the line between X and Y. And then you can think of larger things that’s eventually are called convex hulls. And a lot of people think the model can only interpolate between all the things that it has seen before, but that’s actually not true. One thing that’s amazing about these distributor representations we have in natural language processing, these large neural nets, is they can be on the hypercube of concepts. And what does that mean? For instance, I have images of black cats and I have images of yellow cars. Now, an image generation model will eventually be able to say and create a yellow cat, even though it has never seen a yellow cat in the training data per se. And so it can actually merge new concepts. And now you have to be more creative and think of, oh, I want to see a yellow cat, which doesn’t exist in nature. And so you have to creatively think of that and you don’t have to do the execution anymore. So the models can extrapolate a little bit, but not too far out. And then of course, the most exciting stuff is where humans can extrapolate anything because we’re not evolved to look at, for instance, protein sequences or millions and millions of weather samples from different weather stations to predict where the weather might go next. And so that is where these models can outshine humans already massively.

AZEEM AZHAR: So, your yellow cat analogy, I think, is quite helpful, especially for a non-technical audience. So if we think about this in maybe in non-technical terms, I had a discussion a few weeks ago with someone who has definitely jumped both feet into the large language model space that he’s buying Nvidia GPUs like they’re going out of fashion. And this person said to me, don’t underestimate how far we can take large language models with the suggestion that there’s a couple more years of at least of really rapid technical progress, which we’ve already seen. Maybe even that this is kind of almost like an end state of an architecture to get to a great capabilities. What do you think? How far can we go with large language models to really have the next paradigm shift and moment? Do we get there through scale or do we get there through some radically new architectural approach or indeed approach?

RICHARD SOCHER: It’s a great question, and I actually think that the biggest change for large language models will be their ability to program. So let me explain that. So as you said, large language models just predict the next token, given the previous set of tokens, which can include very complex action like prompts. And their biggest shortcoming is that they will hallucinate and they will make up stuff, especially if you ask it a question around, for instance, a mathematical subject. If I gave a baby $5,000 at birth to invest in some no fee stock index fund, and I assume some percentage of average annual returns, how much will they have by age two to five? Now, a large language model will just be like, I’ve seen questions like this and it’ll just start writing a bunch of texts. But it doesn’t actually say, well, this requires me to think super carefully, do some real math and then give the answer. But you can actually force it. You can tell it to say, hey, if there’s a complex mathematical question, how about you try to translate that question into a computer code, in our case, in You.com, in the genius mode that we have into Python code? And then you write that code, then you run the code, and then you look at the output of that code and give me an answer. And that insight that we can get them to write code, and the surprising fact that they will write code that compiles, that is absolutely perfect syntactically and often semantically also, that, I think, will give them so much more fuel for the next few years in terms of what they can do. Because you think about it, once this model can write code and code runs, software runs the world, AI is eating software, then you realize these models can do things also, you can run APIs that will execute certain things in the real world. And so I think that’s where we get a lot more juice. I think in terms of extra scale, it’s unclear because, at some point, with [inaudible 00:26:55] whole internet, there’s only so much more data that is very useful for the model to train on. But [inaudible 00:27:01].

AZEEM AZHAR: So that was helpful sense of You.com, which is your, I’m going to call it AI personal assistance that you are building. And just unpick for me that process by which you, very briefly, that you are able to create robust working code that doesn’t have confabulations in it. Do you send it to a different system effectively to make sure that it’s robust? So in a way, what the LLM is doing is kind of being like the conduction in a train station, or are you able to do it in a single architecture?

RICHARD SOCHER: You’re bringing up a great point, which is these large language models, some people think they’re just magic and they’re going to do everything. Some people think they’re overhyped, but there’s clearly amazing new capabilities in them. But what we found is if you actually want to run them in production to give to millions of users millions of answers a day, then you actually have to think of them more… And this is an analogy that my friend, Andrej Karpathy, came up with, which is a little bit broken in some ways, but it’s very useful in general, which is to think of the LLM as the CPU of a computer. And the CPU is amazing and it’s the core of the engine of a computer, but it actually needs a RAM, it needs random access memory, and that is our context window. It needs a hard drive, which is our file embedding systems for retrieve augmented generation where you actually can refer back to facts. It needs an internet connection and a browser. And that is what we’ve built, like a new index of the web that is meant to be consumed by LLMs, and it needs all of these things around it. And it turns out in the end when you ask a question on You.com, we actually run 10 different models. Several of them are large language models, and there is certainly one core large language model in there, but you need to have a lot of other capabilities. Just the ability to execute code is a whole Python interpreter that the LLM now has access to it, gets to choose whether it wants to use it or not, and that’s what enables these amazing answers. And in some cases, you also have to switch CPUs completely. Maybe you can think of that as multi-core or something in that analogy, but for some questions, it doesn’t make sense to use a massively large neural net because it’s expensive and slower and it makes sense for a simple question to just use a “smaller large language model,” quote-unquote. And then the LLM kind of becomes the orchestrator of all of these different systems.

AZEEM AZHAR: So, I think that’s such a great analogy. I’ve seen Andrej Karpathy’s paper or presentation on this question as well, and it’s a really powerful model because what it essentially says is that we can make discreet certain capabilities because it’s actually just too hard to generalize them. And I think about how do you get these things to run on people’s phones? How do you get them to run on edge devices? And so what that means is that I think when people hear the word LLM and the way it’s been presented by certain journalists has been this idea that it’s a kind of a full wrapping. But of course, in my simplistic world, an LLM is like a CPU, it’s like the engine of a car. All cars need engines, but not all engines are cars.

RICHARD SOCHER: That’s right.

AZEEM AZHAR: And you need brakes and wheels and an axle and seats and other engineering things that cars have. I’m not a car engineer, so I don’t know, but I know there are other things. And so that’s part of, I guess, the presentation of where we are. But it also speaks, in my mind, to where we could get to with this technology, because ultimately, a car engine will never get you from New York to Philadelphia because you need wheels and you need a chassis and you need a bunch of other stuff. So when we look at the drawbacks of LLMs, one approach could be it’s the productization that sits around it. So, one of the things I found fascinating with You or You.com is it does sort of footnote and reference its answers. So help us just briefly, again, for a non-technical audience, help us understand what’s going on there. This isn’t just magic coming out of a big LLM, there’s some engineering happening. So how does it roughly work without giving away the 11 secret spices that go into your delicious batter?

RICHARD SOCHER: Yeah, you’re 100% right. There’s just a ton of engineering that’s required to make it work accurately. The biggest problem that LLMs had, and this is something we ran into two years ago in 2022 when we wanted to be the first and ended up being the first search engine that actually connects an LLM to the web. And so an idea that obviously has been copied hundreds of times last year by big and small folks. But the main problem of these LLMs is that they will hallucinate, all right, they’ll just predict the next tokens and that might make stuff up. They cannot be trained every five minutes. It’s not physically and computationally possible to train a large neural network or a large language model, every five minutes, a news article happens. So that’s another problem. And then the third thing is, and that is, I think, a general thing about generative AI, which is generative AI is only useful if the artifacts it produces are quick to verify, but will take you a long time to create yourself. So generative AI for images, for instance, is very powerful because you can create an image and then look at it and say after a second or two, that’s beautiful or not, but it would take you very long time to create that image. And the same thing is true for large language models, when you get an answer and you can verify if that answer is actually correct or not, and you don’t know where the answer came from, is it some random blog? You have some whatever cancer treatment you need to know, and you don’t want some hippy dippy blog to be like the main source of that answer, you want to verify it comes from legit research resources and so on and journals. You need that verification. And those were the three main problems we solved by telling the large language model, hey, you can use the internet if you want for this question. Maybe someone just ask, write me a poem about love and paramotoring and then it’ll just write that, and you don’t need citations in a poem. But if you ask a question about a recent news event or a complex health condition or some advice in school, then we can tell the LLM, hey, you could look up on the internet what the right answer might be, and then in the prompt, you can encourage it to use those answers. And then you have to build citation logic. And one thing we found actually is that the citation logic itself is also a hard AI problem. When do you use which resource for your facts? And in fact, there are some folks now that kind of copy that idea of having these citations, but they have fake citations. They say, oh, here’s a fact, and then they add a citation behind it, and then you click into that citation and that website doesn’t even mention that fact. So it’s also an AI problem actually. How do you correctly cite your sources? And that’s where we’re really pushing hard to get amazing answers that are factual, up to date and verifiable through these citations.

AZEEM AZHAR: What you described to me though now starts to look a lot more like the traditional software industry, to be honest. I mean, an LLM is a little bit like a database. It behaves slightly differently to a traditional database because you buy it pre-configured with lots of information in it. It’s stochastic. It’s fuzzy rather than deterministic. So the things that would have you fire Oracle or MySQL or Informix, you wouldn’t fire an LLM for because that’s what you want it to do. It’s part of the value it provides is that it doesn’t give the same answer every time. It also has this distillation, it’s got a compressed version of the whole of the internet inside. But what you described to me actually though in terms of building products, it’s just something that looks very similar to building with databases or with a mobile framework. And I wonder if that’s the case, what you think the structure of the industry is going to look like. How different is it really going to be from how we’ve built enterprise or internet applications in the past? A lot of it is open source, the domain name system, Apache, 50% of databases or more are open source, MySQL and so on.

RICHARD SOCHER: [inaudible 00:36:20].

AZEEM AZHAR: And then there are lots of the consumer front and the JavaScript frameworks are open source, and then you’ve got proprietary systems, and the value actually comes from how those things get stacked together and where companies sit in the value chain and whether they can attach a network effect or something to it.

RICHARD SOCHER: You’re picking up a really good point, which is, I think, there’s a good chance that large language models and maybe even AI will be commoditized, will be not the big differentiator. And then we are already seeing this with open source, mixed trial now on various benchmarks is outperformed plot, which is unbelievable, and it’s open source, so it’s not fully open source. I think for AI, we need to actually redefine what open source means and it’s not just include, here’s a final trained model, but it ideally includes the training data, all the hyperparameters and how to train that model with that data, the training code also, which most people don’t make public, and a bunch of other stuff. So I think there’s more to open source, but still, you can use the model and you can actually fine tune it yourself. And that means that you might argue that the most exciting core is going to be commoditized, and the main differentiator is in standard company startup tech stuff like marketing, design, engineering, making it fast and beautiful and everything. And so I think there is something to that. Now, it is an exciting new capability. You couldn’t have built this a few years ago before large language models came out, and it will disrupt, potentially, this trillion-dollar traditional search industry, and I put all my eggs into that basket. But indeed, to really win, you do need just to do a lot of standard startup things.

AZEEM AZHAR: So, I mean, a lot of startups succeed with mindshare and then securing your developer buy in. And so one could imagine just in a hypothetical world that one thing you might want to do if you’re early out of the gate is say your technology is so powerful that it’s dangerous, that it really needs government intervention, knowing that the government doesn’t have the capability to assess that question or to do anything about it, but it will certainly create a lot of mindshare for you so that you’re on the cover of every paper for a year. I mean, that would be a really, really rational strategy. And then you’d build a lot of big developer community around that. Am I being cynical if I say that?

RICHARD SOCHER: I mean, you’re not the only one. We call it regulatory capture in Silicon Valley, and that is certainly what a few of the really large players that have been able to secure a billion plus dollars in funding have been trying to do. But it seems like the world will not adhere to that. Open source models are out. And so it was a nice try, but hopefully won’t be that successful.

AZEEM AZHAR: I mean, just a brief answer on this one, are you somebody who carries what I think a number of people in Silicon Valley carry these days a p(doom) in your head, or do you feel like that’s not even a question that you should devote a cycle of your time to?

RICHARD SOCHER: I do not. At some point, of course, enough people have talked about it, and I love AI. I’m writing, on some weekends also, a book on AI. And so you, of course, have to answer just some of these questions. And if you’re a non-expert and you hear some of these experts talk about this, you get very scared. And I think at some point, the folks that think p(doom) exists and is very, very large, they’re scared, some person who has a mental health condition and will find a gun and then say, well, I got to murder some AI researchers, I guess, because who wants doom, right? These guys are the most evil people in the world, working on doom. I think it’s a really dangerous conversation. And so let me maybe dive into it more than I would’ve loved to, but I think it’s important. I think the interesting thing, P stands for probability, and the most important thing you learn about probability is base theorem. So it already starts being a problem because these people want to look at p(doom), which is a prior, essentially. They’re not looking at p(doom) conditioned on some data, so it should be conditional probability. But these people don’t look at actual data. They come up with really interesting, fun sci-fi scenarios, like Terminator. We have time travel, Terminator comes back, wants to destroy this and that, and oh, there’s these micro nano robots that are somehow super intelligent and then they will destroy everyone. Or this AI wants to somehow think of humans as this enemy-

AZEEM AZHAR: Very imaginative, Richard. Your book should be a sci-fi book, I think, and it would do very well.

RICHARD SOCHER: And funny enough, some of the biggest proponents of p(doom) are former sci-fi authors, but it’s not research. And when you actually double click into it, and I did engage with several of those folks, and I have a public conversation with Nick Bostrom in the German media, it was all in English, you’d probably find it online, but I engaged with some of them. But if you actually double click into how is it really going to make humanity extinct, get rid of all of us, the scenarios that they actually come up with are hilarious. And then it’s like, oh, but it can influence people, so they all murder each other. And I’m like, if the most intelligent people would always rule everything, we would have very different politics than we do. And so, it’s not like an intelligent person can just convince everyone to do something because they’re so much more intelligent. And so, I think it’s a lot of cool sci-fi scenarios. They’re fun. I would probably watch the action movie that comes out of it too, but we got to keep it real and we can look at real problems. AI does have real problems. It will pick up biases, and humanity isn’t super proud of all the biases and historical training data that we’re racist and sexist and so on. And as AI touches real lives, I’m not against regulating it. It makes sense to regulate self-driving car startups, so not every startup can just go up on the highway and see what happens. I don’t want my AI neurosurgeon in the future to just try some reinforcement learning in my brain and see if it works out or not. I want that massively validated and regulated before it gets to people. But it doesn’t make sense to try to regulate the basics of foundational model research. It’s just absurd.

AZEEM AZHAR: Well, thanks for listening. What you heard was an excerpt of a much longer conversation. To hear the rest of it, go to exponentialview.co. Members of Exponential View and the community get access to the full recording as soon as it is available, and they’re invited to continue the conversation with me and other experts. I do hope you join us. In the meantime, you can follow me on LinkedIn, Threads and Substack for daily updates. Just search for Azeem, A-Z-E-E-M, or if you’re in the US and Canada, A-Z-E-E-M. Thanks.

Source link

The Science of Making Truthful AI

About The Author

Kris Woodard

About The Author

Kris Woodard

Start typing and press enter to search