An introduction to generative AI with NVIDIA’s Mahan Salehi
Jon Prial: Today we have a treat for you. We're going to be talking about generative AI, but we are not jumping into that hype cycle. Although it's very cool and I love reading all the new articles that show up every day about how and where generative AI is used, what jobs are lost, what jobs are creative, and on and on. Now, we may get there in this podcast, but first we're going to step back and we're going to put this into perspective. This is for you and today my guest, Mahan Salehi and I are going to talk through some history and properly introduce and talk about generative AI. Content matters, but so does context. Now, Mahan is an AI and LLM product manager at NVIDIA. And hopefully because I think I know my audience, LLM stands for Large Language Models and it's basically the base data that GenAI is built on. And many of you might have heard the term ChatGPT, that's one example of that. So let's get started. I'm Jon Prial and welcome to the Impact podcast. The material and information presented in this podcast is for discussion and general informational purposes only, and is not intended to be and should not be construed as legal, business, tax, investment advice or other professional advice. The material and information does not constitute a recommendation, offer, solicitation or invitation to the sale of any securities, financial instruments, investments, or other services, including any securities of any investment fund or other entity managed or advised directly or indirectly by Georgian or any of its affiliates. The views and opinions expressed by any guests are their own views and does not reflect the opinions of Georgian. Mahan, I'm glad to be speaking with you today. Look, a number of years ago before every car manufacturer has delivered some type of self- driving technology or maybe they're just talking about it, we had a guest speaker from NVIDIA at a Georgian conference and the topic was machine learning and self- driving cars. And what blew me away was this video of a car driving along something that I don't believe I could call a road. It was through the woods on a dirt path. It was going around trees. It was really just astounding. So to step back, semantic matters. So I call what I saw in that video machine learning, because in my mind it was just using the data it had and it was basically trying to determine what the road was and where the road was going. Do you think I'm right or is that something we should have called artificial intelligence even then?
Mahan Salehi: That's a great question. I think that when we talk about deep learning versus AI and machine learning, the terminology use can be very confusing. What's a subset of the other? I think that artificial intelligence, you hear a lot of folks talk about it as a system that could generate a response that's very human- like. So just as good as a human or close to being as good as a human being. Now if you want to dive into the technical details of that, you have different techniques of doing this, machine learning, algorithms and deep learning algorithms. And the innovation that happened over the last couple years that we saw, and this is what led to things like self- driving being a possibility, is that we went from very easy rules- based models that were simple to use, mostly linear problems and linear functions that couldn't generalize well on data, that couldn't do tasks that it wasn't really trained to do, to then being able to do deep learning models and machine learning models that were able to extrapolate on data that they might have not even seen before. And so in the case of self- driving, you can teach a model to recognize images of the road and forests and things around it. And if he sees enough data over time, then it get really, really good at recognizing these things even if it hasn't seen that same exact type of scenario or environment around it. And that's something that we hadn't seen in two decades ago or even three decades ago with more traditional models where they were focused on one specific application and they couldn't understand things that they never saw before.
Jon Prial: So when Netflix recommends a movie, that's simple. That's really just machine learning, it's not going really too far. It's years, but I love that you could feed images of cats and dogs to the model and it could parse them. And taking that self- driving only machine learning could take an image and figure out that it's a tree on the side of the road versus a person. We couldn't program for that. But you're saying, " I take this to the next level and I've got deep learning now. It could begin to extrapolate from that." Now you still have to give it guidance as to what to do, correct?
Mahan Salehi: Absolutely. And I think the key things to understand about deep learning models is that a lot of the problems that we see in the real world are really complex. They're non- linear relationships. And so deep learning models, the interesting thing about them and you hear about neural networks being thrown around, buzzwords like that. Basically the design of how they're created, you can think of it almost analogous to a human brain where you have neurons and synapsis and the connections between these neurons. And the larger these models get, the more parameters or neurons you add to these models, the better they are at being able to look at a bunch of data, build an internal understanding of what they're looking at, and then be able to learn from that and generate those responses that you're looking for. Recognizing images is not something that, although it's simple to us now, back then 20, 30 years ago, was not a very simple task. Because as soon as you start to mess around and you know show a picture of a forest and it has different types of trees that we haven't seen before, there's clouds in the background, that can mess everything up. And the whole point of deep learning models is that they get so intelligent that they could see the trees through the forest. If they see enough images, they're able to really generalize well and understand what they're looking at.
Jon Prial: So one narrow question and then we're going to go to stay on this general piece. But I've been totally fascinated when Google did AlphaGo and it beat the best Go player in the world. And my assumption is it just gave it a simple set of rules. Where to put the stones and how to play Go. And that's all it needed to know and then it played itself, I don't know, 40 bazillion times. But then when it beat that humid, it created a strategy that no one had ever seen before. I'm shocked by that, but I assume because it's narrow and there was some rules and it was fenced and they were able to just kind of figure this out along the way. And in my sense, and I may be completely wrong here, this is like AI V2. So how would you categorize those things?
Mahan Salehi: I think that especially, and we talk about this in large language models, NLP or Natural Language Processing is a gray area to talk about this because when we went from, again, rules- based models that could do very, very simple rudimentary things to then deep learning models that could learn from a lot of data and do some of the things that you're talking about, which has come up with new and innovative ways of solving problems that we might have not trained it to do. But there's still some limitations. I think when we talk about large language models, understanding the languages of how human beings talk and the patterns that we talk in is a very complicated thing. So you can tell a model, a deep learning model five years ago, to translate something from English to French and it could come up with a way of doing that that would be maybe different from how a human being would translate words. So we might go one by one, word by word and translate. Whereas an AI model might look at the whole sentence and the words around it and say, " Okay, what does this word mean in relation to another?" What's new and interesting about, or I guess V2 AI is this generative capability. Because we're going from a place where we're understanding language and understanding data to now being able to not just understand it and do a better job of understanding it but generating it. And so those generative capabilities are something that we haven't seen before and it's going to unlock a bunch of new use cases.
Jon Prial: That's great. Let's go through the two pieces. My view of generative AI. So I do have text and people think of ChatGPT, an image, stable diffusion or dolly, help me understand, kind of restate again this generous statement and talk to me like I'm 12 years old.
Mahan Salehi: Absolutely. So that's a great point. Generative AI, when we talk about large language models is focus on language, but generative AI applies to many different things. We can generate text, we can generate music, we can generate art, images. The foundation of all pretty much all these models is a specific type of model called a transformer. It was talked about in a paper by Google and the University of Toronto researchers in 2017, which is really the foundation of everything that we're talking about here today. And you can think of it as a new type of algorithm or a new type of model architecture that's really good at being able to understand, especially language related tasks, but also any type of applications where you have sequence data, whether that's images or audio or music or anything else. And being able to not just understand what it's looking at but generate something out of that. And so when we talk about language models, we have things like GPT-3, we have ChatGPT. These things are really powerful because they're very large these models. And so the larger they get in some cases the better they are at being able to do lots of generic tasks with one type of model and do multiple different things at once. But what they're also really good at is being able to be customized further on a downstream application. So you can take these types of generative AI models and tune them to be able to generate images of cats or you can create one that's just designed to create Hollywood movies. Another one that's designed to talk to only salespeople at a company or lawyers. And so that's what's really interesting about them is that they're not only just general intelligence models that could do lots of different things really well, they're generalists in a way. But then you can then take them and customize them to make them more expert domain specific models.
Jon Prial: And the T in GPT is Transformer. So then you're calling transformer really that next turn of the crank of technology.
Mahan Salehi: Exactly. And ChatGPT is very big now. Everyone thinks these types of models are brand new. Those of us working in this space have been developing these technologies for a while. So transformers were announced in 2017, but what's different is that there's unique things about their architecture and we can talk about self attention mechanisms and other things like positional encodings. But the idea is that they're really good at being able to do a better job of understanding, especially language, human language, but also even image data and other types of sequence data and then be able to be trained on large corpus or mountains of data. And the more you throw data at these types of models because they run really well on GPU for parallel processing, the better they are and the more powerful they are at being able to do really cool and innovative things that we haven't seen before, like generate a video of a cat, moonwalkingg and doing crazy things like that.
Jon Prial: So let me talk about the breadth of that LLM or the narrowness of it. And it really depends on, I'll give you the example that I'm thinking of. There was a fascinating article, it was a piece, we're going to write something. And I think it had an option of Star Trek, Shakespeare, I forgot what the third one was. So I was following the Star Trek one, nerd that I am. So it said it gave it like a prompt for a Star Trek script and then it showed you what came out of the model after one pass. And it was nothing but random letters. It would be kind to call it gibberish. And then it ran a hundred passes and it was okay, it was gibberish. But after hundreds and hundreds and hundreds it was showing you after a thousand, after whatever the numbers was, 10, 000, all of a sudden you found words, but they were just words. But then you found words that related to each other and then you found words that were very Star Trekky. So I guess my question to you is, did it just feed it Star Trek scripts or did it feed it every text there is in the universe and say, but we should weight this towards Star Trek, because I know these neural networks have obviously weights that programmers put in. How did it get to Star Trek?
Mahan Salehi: Yeah, that's a great point. So these models, the first thing that we do with them, we call them first of all, foundational models. Because you take a transformer model and what you do, and this is what happened with ChatGPT and many of the other popular models you've heard of is you train them on a lot of data that you find publicly available on the internet. So Wikipedia articles, Reddit posts and so on. And this is done in an unsupervised learning way. So what that means is we're going to just give you a bunch of data. We're not going to teach you anything. We're not going to guide you. We're not going to tell you, " Hey, neural network, this is how you understand how to parse through language and learn the relationship between words." You literally just give it a bunch of data and say, " Figure out for yourself how human beings talk to each other and what is the way in which we communicate."
Jon Prial: Because you talk about unsupervised, just to clarify. If I'm teaching a car how to drive, I have to give it rules, right? Stop at a red light. Is that supervised?
Mahan Salehi: That is supervised. The challenge with a lot of AI models is that you got to give it an input and then in the beginning stages when it doesn't know what it's doing, you show it what the output should be. So similar to when you have a toddler and you're telling it, you're trying to teach it what's right and what's wrong. Every time it makes a mistake you say, " Hey, no, that wasn't correct. This is what the right answer should have been." And then the toddler learns, " Okay, this is what I'm supposed to be doing." The AI model does the same thing, but what we do in AI models is it does a bunch of math to update its weights, it's parameters, and then it figures out what it's supposed to say and what's not supposed to say. That's supervised learning. And then when we talk about these language models, it does this on its own and it's unsupervised. So I don't give it a bunch of Wikipedia articles and I say, for example, " This article is about Shakespeare. This is the topic of the essay." It just figures out that this essay is talking about Shakespeare, who Shakespeare is, what does it mean to write a poem. How human beings compose poems, it figures that out all by itself. And that's what we call unsupervised learning. And so from there, that's the first step. We just give it all the data we can find on the internet so it can learn the basic rules of grammar and human language and how humans being talked to each other. From there you have a model that could do a decent job of understanding what words mean in relative to each other. But then like you said, you want to be able to narrow down on some very specific applications like" Hey, I want to be able to understand Star Trek jargon." And so the problem there is that a lot of the data that I'm going to train this model on the internet maybe didn't have a lot of Star Trek data. Maybe this is the first time it's seeing it. And sometimes these models, what they're really known for is being able to still do a good job of generalizing to new data they haven't seen. But the more specific you get, the harder it is to get good results. And so in this case with Star Trek, what I can do is take this model that already has a lot of good foundational knowledge about how humans being talk to each other. And then I can give it just a couple more data points to show it, " Okay, here's how Star Trek characters talk to each other." And then from there it's able to very, very quickly understand what Star Trek is and the terminologies used there. Almost think of it as, you teach a high school student how to learn the basics of algebra and then you teach them then from there how to solve more complicated problems. They're going to be really good at quickly learning because they already built up the foundational knowledge that they need to know about math and the rules of mathematics and whatnot. So that's what AI models are really good at. They're able to learn from a lot of generalized data and then they can dive deep into a very specific thing with only a couple more examples.
Jon Prial: Let's say I'm an entrepreneur and I've got first party data. First party data must really matter because there's just gobs of third party data out there. But anybody can get access to that. But if I'm an entrepreneur and I'm starting a company and I've been managing to really do a good job of collecting this first party data and that's very relevant to. My customer set, my assumption is I can take this large base, this large LLM, and then I can uniquely augment it with my own first party data and end up with something very special as a product. Is that right?
Mahan Salehi: Exactly. And we call that process customizing or tuning the model. So that you take something that's a generalist and you teach it how to be a domain specific expert. And the more data you have, and especially if that data is your differentiating moats and something that you have access to which others might not, the more you'll be in a position to then end up with a model that can do something that other people can't recreate. Because at the end of the day, the data that you have access to is the thing that makes or breaks the lot of these models and how good they are performing certain tasks.
Jon Prial: My market research hat's just burning up off my head'cause I'm thinking about so many things. I could create a medical chatbot and I could scrape WebMD and a million sites, but so can anybody else. What can I do to be a successful medical chatbot company? And it does sound like everybody needs to be searching for their own secret sauce.
Mahan Salehi: Searching for your own secret sauce and coming up with clever ways to continuously collect more data and especially if you can find ways to involve your customers in that process. So a good analogy that I always like to give is with Tesla. One of the reasons why they're leaders in the autonomous vehicle space and their cars are able to do a good job of recognizing what's in front of them is that there's a fleet of Teslas out there that are constantly collecting data from the roads. And when they see something or that they don't recognize or there's an issue, the AI model breaks down, that data is actually sent back to Tesla and there's a human being in the loop that teaches the model, " In this case you missed the fact that this was a stop sign, I'm going to make a note of this and I'm going to then retrain that model so it learns." So the next time it's in that same position again. It's able to make sure it gets the right answer. And if you have enough of these cars out there and enough humans in the loop to be able to make these corrections, you end up with a very good feedback cycle, feedback loop where models are constantly iterating and getting better over time and learning in a self kind of a cyclical process. So if you're able to get data that's very unique and differentiated, that's great. If you could find a way to constantly be able to get feedback on how the models are doing, that's even better.
Jon Prial: Fantastic. And I guess that's a data moat which you referenced to earlier, so that for sure that's data moat.
Mahan Salehi: Exactly.
Jon Prial: So how has your work at NVIDIA evolved? I mean you talked about 2017 when all some of this stuff started. How do you see what you were doing and you look back on your career and how do you feel about what you see as you look ahead?
Mahan Salehi: It's definitely been an amazing journey. My break into AI was through the startups that I worked on earlier in my career. And even just from when I created my first company to now, the advancements in all the fields of computer vision to natural language processing and manage systems have been truly astounding. I think NVIDIA is a very unique position because we started off and we still do, work on building a lot of the hardware that actually allows us to be able to train these models and deploy them, right? AI models, what they're known for is parallel processing and GPUs are designed to do parallel processing, that's really the big advantage of them. And so without the hardware, none of this innovation would've been possible on the software side. But what's been really interesting and really cool is that I work on the software team in NVIDIA at a deep learning software team where we build products that allow customers to be able to take foundational large language models like GPT and then be able to train them from scratch or customize them on their own data sets, usually proprietary data sets, and then end up with a model that's unique to them and also help them deploy those models. And so we're really focused on targeting enterprise customers that can't really take something like ChatGPT, which is kind of very generic and deploy it. They need to take it and customize it on their proprietary knowledge bases so they end up with a model that works really well for them.
Jon Prial: So it's interesting that we're going to take this baseline and we're going to put the right thing on top of it in the case of medical. Of course there's software engineering, there's marketing collateral. So what do you see, Mahan, as you look across the breadth of industries, are there one industries that you think are going to get their sooner than later or everyone needs to race like hell to win? What's your view of how this is going to go kind of vertically oriented?
Mahan Salehi: I think what's really cool about these types of models is that the foundational models are horizontal. So you can take them and deploy them for any applications. As you said, you can take them and adapt them to very specific verticals. And I think that the short answer is every single industry can be impacted by this as long as there's enough data to customize these models on. But I'll give you some very concrete examples from even the work that we do at NVIDIA and with enterprise customers, we've seen lots of interesting use cases and I like to kind of focus in on healthcare because I have a background in that space. And so I've seen things where these generative AI models are used for things like being able to generate models, 3D models of the human body and organs for surgical planning, to teach medical residents how to be able to perform surgeries in a way where it's obviously noninvasive. To be able to build essentially a digital doctor where the patient can come into a room, talk to an AI system, it can understand everything you're saying, help diagnose what kind of conditions you're going through. It can look at you and even pick up on visual symptoms. So models are starting to become more multimodal and that's what we're also seeing with GPT- 4, where they're not only able to take in text as input, but also images as well. And then another interesting application is drug discovery. And this is something that lots of people don't really think about when you talk about large language models and generative models, but if you think about it, human language is very complicated to understand, but the language of protein sequences is also very difficult. And it just turns out that transformer models, these types of AI models are also really good at understanding that. And so now NVIDIA's working with companies like AstraZeneca and others to design unique type of large language models catered towards the life sciences and drug discovery to develop lifesaving vaccines and groundbreaking medications that can really help cure diseases that in the past would've probably taken decades to develop and do R&D on.
Jon Prial: Those are great stories and considering we started talking about a text and images, you've just brought them together. This is such a great conversation. I can't thank you enough for giving us the time today to be with us. Thank you so much.
Mahan Salehi: It was a pleasure. Thank you for having me.
DESCRIPTION
On this episode of the Georgian Impact podcast, we’ll be breaking down the technologies that make up generative AI and how it works. From Large Language Models (LLMs) to deep learning, this podcast will help you understand how AI has evolved to get us to this point with GenAI and what he’s excited about in the space.
Mahan Salehi, AI and LLM product manager at NVIDIA, will explain how the space has evolved from his experience at NVIDIA working with AI.
You’ll Hear About:
● Machine learning vs. artificial intelligence.
● The need for guidance for GenAI models.
● Rules-based models vs. deep learning models.
● The two pieces of generative AI.
● How foundational models are trained.
● The value of first-party data.
● Mahan looking back and looking forward.
● The impact on different industries.