Episode 94: Machine Learning 101 for CEOs with Adam Geitgey
Machine learning is really hard, right? That's just so complex that many people probably think it's not for them, but it doesn't have to be that way today. I'm talking with Adam Becky. The writer behind the machine learning is fun blog when she's written for people who are curious about machine learning but have no idea where to start in addition. Adam is compiled his writing at this some new e-books for executives and developers that you can download from the site in this episode was going to be going after a topic that's more relevant for that first audience specifically, we're going to talk about data and how that makes applications and machine learning work. If you're in the c-suite stick around I think you'll find this very interesting indeed. I'm John broil. Welcome to the Georgia impact podcast.
Adam welcome to the show. Thanks for having me a pleasure to take me a little bit through your background how you got to where you are today? Sure. I've been a software developer in my whole career verse when I was a kid programming, you know, when I was 7 years old and throughout my childhood because I was interested in it that had a pretty traditional background and went to Georgia Tech for computer science, but when I went to college and there wasn't really a such thing as like a machine learning major and it was even really interested in that point. It was much later in my career that I learned about statistics in machine learning and how to apply those a software and it can open my mind's the whole world loves more things. You could do with a computer.
And about 10 years ago. I started going to poke around with that reading and learning and then from there I grew into building machine learning stuff and then eventually even Consulting in the area fantastic machine learning is fun blog for years. I think it does an amazing job of explaining things clearly yet with the type of depths of people want and congratulations on turning this into an e-book. Thanks. Yeah. I've been writing this for 45 years just because I wanted to write down the things I was learning and I feel like if you can explain something easily it helps you understand it yourself and then be able to eventually turn that to a book with super exciting for me. So tell me a little bit more about kind of the writing a Blog to have a feedback that you received. What are you doing this what kind of good positive or negative feedback we getting along the way and yeah, I just focused on making machine learning accessible to anyone, you know typically capable of some kind of program your computer background, but trying to take it out of the academic world and be able to show people these are things you could actually use yourself to build.
And because of that surprised me is I get a lot of people reading it but aren't necessarily developers themselves. They're like CEOs and product managers people who just want to know what are the things you could do with machine learning a lot of times they give me feedback when I say, well, I feel like I understand that now I feel like I know what my team is doing and to me that's a success. If someone who doesn't build it themselves feels like they get it and then some direct the work. I think that's really cool. Perfect. So you don't think a normal guy, you know, sometimes we got to understand the data. I think that's going to be the most important part is the key to making obviously all Computing systems work. So if I put things in perspective, I can argue the concept of a database right enable the Great Leap Forward what computers can do and one part of course was a query the database and all the things that can happen but to get it all started. Someone had to create a data model weather was a banking model. They would be a balance and account number it might be transactions or if it was an inventory system did be SKU
Lulu's in Clawson names and couldn't get the data right without getting the business right around understanding the business Fair. Yeah. Absolutely. I think the first step of doing something with data is understanding what that data is and how each piece of data relates to the others. So be able to pull out, and things like names and social security number is the first step is the only what you have it's not really day that you can use and chronic requirements of those times. Probably I think the term of the time until the business analyst and now we talked about product management, but it was always that link between the technology the end-user right? It was about capturing what the end-user was doing or were things they were creating and giving the program ways to use that data to produce a result. And that's the other flip of the switch which is going to begin to blur into this discussion of machine learning, but I was the algorithms of course, what is it that you do with the date? How you going to get there?
So we can actually come into algorithms and leave algorithms are being the talk about what these new machine learning tools generate.
Yeah, that sounds great. I guess it's time to figure out if we're about to take a big leap or a small leaping. You going to pay have to tell me, where we are. Take us from data models where I definitely understand that this feels and names finish more complex world of features. What does that really mean? Yeah. That's a great question. I think in the original days of databases like you were talking about you're trying to capture the pieces of data that the program needs to work with, you know, like the customer's name and how they relate to the things. They purchased that both the the discreet things like customers and and items and also the linkages between those things with a seizure what we're trying to do is fine pieces of data to correlate with some output so
Change that the program needs to know just to ship products and delivery thing as a customer purchased or not necessarily the same things that correlate with some how come we want to predict in the prediction model switch features are trying to boil down those pieces of data that we do have those those columns of data in the database to the ones of their most correlated with the outcome. We want to predict that's all. What kind of boiling down and cleaning up and protecting that data to the cleanest mapping to the output we want and you do need to know anything about the amount mapping to the output you kind of need to know the app the application the objectives of the algorithm you kind of look into run against that you need to know more than just the day that right. Yeah. Let me give you just a really simple example, like I think everybody can relate to hospitals, you know or a doctor's office you go in there and visit you get something done and Lucy were trying to predict a patient's outcome and the data that we collect when they come in is Earth date. That's super helpful cuz that gives us
Indication of how old the person was and probably the patient's age correlates with when they will get a certain procedure done. But what the model really cares about is a patient's age of the time they got the procedure done not their actual birth date. So in the database you might have the patience birth date, but by the time you fit into a marble you might have seen that up and converted it to age of time of service because that's more correlated with the outfit and I'll give them a better signal and that level of bringing data to the model going back to her earlier example is goes back to data data cleansing and making sure you have the best data that could then be acted upon that really is a very similar process of what we've done all along with these Data Systems. Yeah. I think it's his kind of the next stop, you know that the first step is to collect the data and understand what you have the second step is to make sure the idea is accurate and then from there you can start to do things with that bathing Creighton new derivatives of that day that that are appropriate for you to billion prediction model or feeding into other systems that you have uses for.
A little bit about in some of my research on some of these features. So it's computer vision a feature could be something like edges. You would never have that in a transactional data system for the word. I have always get Rogan speech. It's a phoneme writing different things that are relevant to how you going to apply the tech, right? Yeah, and if you think about it like this computer vision system and its goal is to look at the picture and tell us what's in that picture is a picture of a cat or a dog ever algorithm to make that prediction isn't very sophisticated. We don't want to just feed in a raw picture with a bunch of pixels. We want to see then like a reduced simplified representation of that picture like where the edges are or what colors are in that picture. How much of each color appears is that makes the job of the prediction algorithm simpler cuz I can look at those kind of boiled down features and say okay if if the color orange is really prominent and there's lots of kind of furry looking edges. That means it's probably a tiger
Where is like the Holy Grail the end result when I get to is it can look directly at the image and we haven't done any feature extraction or any kind of extraction of the original data and I can just pull out the answer right from the original image showing the history of machine learning. The earlier days were humans figuring out ways to simplify the data like these pictures and come up with these simplified features. Like pulling out edges are playing at color to the computer. And then in the modern world the Deep learning the ideas of the model can do more of that. I can look at the picture itself build its own features from that Rod data and then make a prediction from his teachers less human input in the whole pipeline, but are you doing able to do more Automation and have more automatic feature extraction, but my guess is we still don't lose the art as well as the science of where do we keep the art in the end?
Piece of data science feeding machine learning engines for example image recognition systems. The whole field is just full of like things you learn from experience to make things work in the difference between a really good system in a really bad system is kind of that tribal knowledge you pick up a blight. Okay, what size image should I feed into the system will turns out 512 pixels by 512 pixels works pretty well. If I have 10 million images are more like that kind of stuff you kind of learn through experience to even the modern world where the computer does more of the work in Wylie. Gray them.
There's still a lot of Art and trial-and-error to get things to actually work and produce good prediction results with these systems. So that never goes away. But even that being said day the day most of the machine learning to get done in most companies is not deep learning. It's not image recognition. It's usually taking things out of a database cleaning up that data and using out to make some kind of prediction for the business and that's that's the most looking for doing that requires a lot of future building and feature extraction. So company said they are likely as a CEO think about the date of the his or her team is working with are going to begin to really dig into asking those type of questions. I'll let you two examples. So I'm thinking about a Netflix recommendation system. They're still features there that has to be that they can have to be told it is a comedy or a drama. I mean they may know that all this data to the system that if I like movie I like movie be but that may not be sufficient enough. Maybe I do need to know the days are comedies.
Dramas or strong female leads or whatever. They tell us that those are still done by humans sort of so this kind of two approaches to this. I like to think of them as they're being a scale of approaches on one side of the scale are humans do a lot of work classifying and in creating the base day that they were going to use to classify and feed down to machine and on the other side of the scale is a computer anymore more of that work because it has more data to pull from so for example and recommendations, if you have millions and millions of examples of people saying I like this moving, I don't like this movie and have lots of people voting on the same movies. You can actually do some math and work backwards compatible out a profile of each movie based on the correlation between the people and then what you end up with clusters of movies on Netflix. Is is a hand as clusters of movies that people and they say what is similar about these these movies like it looks like the same kinds of people
And say oh these are strong female lead set in Atlanta for example, but that wasn't something to someone said in the beginning exactly something they labeled after the fact and it's kind of a newer approach of doing this is that different or the same as supervised learning is just any case where you you have some data and you have some outcome that you want to protect in the machine learns to take that data and figure out that out, So like in the image example, this picture has a cat yes or no supervised learning is saying look at a new image and tell me if this image has a cat yes or no, when you start doing things like pulling things out of data where you're not telling you up front what the answer is that was born to be unsupervised category. So they're stirring the blur the lines at Netflix and do more unsupervised learning supervised elements like the fact that we know which movies to a white but the fact the reason that information to them pull new data out. That's the unsupervised part. We're kind of learning things along the way which is it was really cool and kind of the future.
Because as we get more more data, there's a lot more data that doesn't have the answer given to us. There's no label with the data. So is unsupervised learning we can do a lot more with a lot more day that we couldn't do before releasing Evolution. So as I think of the model of kind of transactional systems in traditional data models we moved to this world were we begin to do features and labeling? We will we will be beyond that but let's talk about some of the apps around it. And here's my thought. I'm going to make a bold leap year, they even 30 years from now, they'll still be banking transactional systems that are taking your account balance and when you take $50 out of the ATM made subtract $50 from your balance, and I'm going to ask you a bike senses AI isn't replacing that way. I might be using a round all that data for fraud detection or something out of these just coexist. Yeah, that's a great question and you're exactly right. It's like
You know you think about the old like you are Kia needs like you need shelter before you can worry about, you know, your next senior next game of the top of the pyramid is love, you know, it's the same thing with data and Hilary Mason from Cloudera has its really great explanation of the evolution of data science in a company and the foundational layer is you got everything in the database like everything is in one place in the second level as you can query that data and do basic reports. Like how many times I sold something how many times a customer came for a website? You can answer the basic question the third step is you can make basic predictions like not machine learning just like a person can sit down with Excel and do regression and say, oh I can see trendlines and I can do basic stuff. And then the fourth level is Adidas in one place. It's clean and we can make basic predictions people use that as a source of Truth in the company. Now, we can go machine learning models on top of that it harder processes or unlocked new.
Are you at that data? So it's definitely Acura key in that bottom layer is never going away and the better and cleaner. By the way earlier is the better you can get everything else. Like I would say 70% of my time is a company comes to me. They say they want to build something really cool to Ai and then we spend three months just getting the data from the random databases. They have been cleaning it up and stuff was already in place of really good. You could jump right to that last level of building cool models. So that's a great question for the for the executive team to be challenging the technical team. So it's fair for the executive team to be questioning. What's the date of you have and how good and how clean it is and it's no longer just sufficient to operate the base transactional system. You'll never move up at hierarchy of needs and be able to apply more interesting techniques and come up with more inches and usages in value to what you can do with that data if it's not really in good shape and a good way if you're a CEO and you want to test
Your team and know where you're at really easily go ask your team to build a report on some random thing and then ask him where they got the data and if the answer is specially if it's a report that crosses two or three product areas or crosses two or three different domains in your company at the answer was I had to go talk to this team to get this piece of day then had to go talk to this other day that the cleanup and join it if it's that problem first before you worry about Ai and machine learning. Because that means that the company doesn't have the data game together and you're not in a place where you can really leverage your data coming to a close cuz this is Greg. I want people to remember this is this is kind of a great clothes. So all about quality data managed correctly. You can only use machine-learning when you are ready what has happened to algorithms. Are they still around in this M L word or they are higher level concept of exploiting what's in the date of that you have and it shouldn't be together and buy better when I mean is
In the old days the algorithms go to work with a little bit of data, but they can only learn relatively simple functions are the only model relatively simple systems in the Deep learning algorithms the newest approaches to get more more complicated every day. They can learn much to do much more complicated things. Like for example, the text faces in the picture and tell you that person is or other things that seem much more magical to us, but they need a lot more data to do it. So if you draw a graph it's like overtime the capability goes out the date of required and the day they can take advantage of goes up to so the algorithms get better and more capable, but that doesn't mean the old algorithms or any less useful today like for me I still use a lot of these older simpler algorithms day today in my work because of the best fit for the problem working on at the time and is more like you have this buffet table of algorithms in front of you that you pick the right one that applies best to the problem based on how much data you have and how hard the problem you're solving is and I think a secret that CEOs don't hear a lot is that
Like 80% of the stuff happening and companies are using older relatively simpler algorithms not deep learning and the game of job done. Just fine. Where is your focus and deep-learning on the specific problems. Were you need that where that the higher infrastructure and computational Dustin is worth it because of problems harder and needs that to get it done. Sounds like there is where the art will I guess it was a pleasure chatting with you. Good luck on the release of the book and we look forward to chatting with you again. Yeah. Thanks a lot. This is great.
Machine learning can be complex, but it doesn’t have to be. For many product leaders and C-suite executives, it’s about grasping the concepts and being able to direct the work. In this episode of the Georgian Impact Podcast Jon Prial talks with Adam Geitgey, the author of the book Machine Learning is Fun, for people who want to get started with machine learning. They talk about the evolution of machine learning in terms that everyone can understand.
You’ll hear about:
- Why it’s still all about the data in machine learning
- What key terms like features, labels, and unsupervised learning mean
- How you know you’re ready for machine learning
Who is Adam Geitgey?
Adam Geitgey is an author and software developer with over 15 years of professional experience building large systems and managing development teams for companies like Groupon. He believes that machine learning is going to be a large part of the future of software development and that it's important for every developer to have a basic literacy of machine learning techniques. This was the inspiration for his book and blog - Machine Learning Is Fun - https://www.machinelearningisfun.com/.