Episode 114: What Makes a Successful AI Project?

00:00

0.5
1
1.25
1.5
1.75
2

This is a podcast episode titled, Episode 114: What Makes a Successful AI Project?. The summary for this episode is: “If you want to be a good data scientist, you should spend ~49% of your time developing your statistical intuition (i.e. how to ask good questions of the data), and ~49% of your time on domain knowledge (improving overall understanding of your field). Only ~2% on methods per se.” Nate Silver, a statistician and writer who analyzes sports, elections and more. In this week’s podcast Jon Prial is joined by Tara Khazaei, Chief Data Scientist, National AI Team, Customer Success Unit at Microsoft. Jon and Tara talk about how domain knowledge, as well as statistical intuition, make for more successful outcomes in machine learning projects. They discuss performance through the lens of projects Tara and her team have led at Microsoft. In this episode you’ll hear: <ul> <li style="font-weight: 400;">Why you need enough, high-quality data </li> <li style="font-weight: 400;">The importance of iterating and validating your approach to achieve the best performance</li> <li style="font-weight: 400;">The challenges bias and explainability pose to ML projects</li> <li style="font-weight: 400;">Why domain knowledge is crucial for successful outcomes </li> <li style="font-weight: 400;">How to decide when your model is ready to go into production and why you need to go beyond accuracy </li> </ul> Who is Taraneh Khazaei? Taraneh Khazaei is Chief Data Scientist on the National AI Team, Customer Success Unit at Microsoft. In this role, she advises Microsofts clients on how to adopt machine learning. Working with clients, Tara has researched the state of the art of speech to text methods and technologies, developed deep sequential modeling methods (e.g., use of embeddings, RNNs, and transformer networks) on terabytes of clickstream data to model and predict user online behavior and designed and developed an ML pipeline to predict the market price of a vehicle.

Key Takeaways

Transcript

What it takes to be a good data scientist: 49% statistical intiution, 49% domain knowledge, 2% models

00:34 MIN

You can't throw data at an ML project, you need to take an iterative approach.

01:08 MIN

Bias and explainability are two of the biggest challenges AI projects face

02:42 MIN

Your data could reinforce societal biases.

01:59 MIN

How to make sure your model is ready for production.

02:01 MIN

Warning: This transcript was created using AI and will contain several inaccuracies.

Let me quote from a recent Nate silver tweet made is an American statistician is a writer analyze a sports elections and more. He's the founder and editor-in-chief of fivethirtyeight. He's also a special correspondent for ABC news. But here's the quote. If you want to be a good data scientist, you should spend about 49% of your time developing your statistical intuition is how do you ask questions of the data and about 49% of your time on domain knowledge improving overall understanding of your field only 2% on methods for second place podcast machine learning and AI but not the 2% nothing I machine learning techniques today instead. We're going to be talkin about how domain knowledge as well as statistical intuition can help develop some really powerful Solutions with me. Today is Tara has I a data scientist and solution architect that Microsoft prior to that. She's been a research scientist and has recently been working with some team.

Is it a crate is really pretty interesting solution and I'm looking forward to a great discussion. I'm John Pryor and welcome to the Georgian impact podcast.

Sitar, here's my question because I want to start this dialogue and you over the years. We've been learning and watching and hearing more about machine learning and studying and I've always heard different people say look, all you need to do is throw all the data into the machine Learning System and you'll just get all the answers and I don't want to be a skeptic doesn't quite work for me. So I wouldn't use an example and has a project that I think you a part of it was optimizing bus routes so I could take all the day to there is about bus routes and I can throw it into a system. So I don't know how many people are on the bus the time of day when it arrived when it departed all that stuff, but when I learned from doing the research on getting ready for this podcast if

Weather data is not included in that day to sit. You're probably not going to get the results that you're looking for. So talk to me a little bit about just how to get the right data at how do you think about building a machine learning solution? Yeah, for sure. That's a great Point John and as you'll know he's the main driver RBI and if you don't have the right data as you mentioned will the machine learning pipeline is not going to perform as we expected to perform and for that particular project be actually to include GPS data weather data traffic data, all the data points that are customer being TransLink Vancouver Metro System provided us was silver so much more that we can always include and take advantage of two even boost the performance further making sure that we do have enough data.

The right data points that are of high quality is that one of the very first steps of approaching a data science in an ml project, but I guess in support of the people that tell me throw the data in you don't start necessarily with a theory you really start with kind of looking at the broadest spectrum of what data might be relevant to the challenges you trying to solve Pitbull that's true. This field is called data science for a reason. It's at the end of the day a scientific approach that we want to take starting from the right data, but not necessarily we don't at the very beginning. We don't necessarily know that it's the best state as a possible. We can always make it better and better and better and at some point we got to start experimenting with that data and go through the entire cycle of building an email project starting from today. But going through the modeling evaluating tomorrow tracking the results in a game.

Making informed decisions based on those results and going back to the initial data set and then the next step in the Nexus 7 modifying all of these two stops afterwards. We write that. Experimental and that scientific approach is an integral part of building an EML project for sure the first to 49% asking good questions of the data shirts and I like that you're you iterate. You don't just throw it in and say here's what we think the answer might be you really just a test and validate we often spend a lot of time from about bias biases. So you really have to come and look at the output of the day to make sure the day to you brought in was in biased and the results you got at the end or Not by us as well. Yes. That's that's very true. That's one of the what does it mean challenges in Dai in Emmaus Community both in Academia and Industry to make sure that Rai in a mile.

Decisions are unbiased and fair in the main reason is that guy in the mall? If you's blindly can simply amplify to buy a sees that we having a data and to avoid these biases to be reinforced and Amplified by ml algorithms. We really need to spend time making sure there a Jedi's Deep by us and is representative of different communities. It's inclusive and diverse in the later on goes through that entire cycle of building a machine learning pipeline. However, it is a very very challenging process to make sure that the data is d by us. If it was that easy there wouldn't be so much effort. But again, it could be me an industry to make sure that we have done by a person as a call and say I am one of the approaches that you see commonly these days as employing explainable Ai and interpretable AI approaches to make sure

The results are fair in ethical and it is a very controversial topic because everyone has their own definition of explainability. Some people really believe that you buy explainable AI really achieve explainable AI you need to be able to understand the inner working of your black box to be able to do that. You really have to sacrifice performance because there is this unavoidable fundamentals paid off between accuracy and explain ability and some people believe in post hoc analysis. They say, okay. We do have the input of the eye system or that black box and we do have the output what we try to make associations between the two without touching the black box and beast on those associations. You start explaining the output of that black box. And there's a third Wii U on this as well, but people say yeah, we don't really need explainable AI all we need to do

Who is to do I stood run extensive tests on the performance of the system based on some Metric stubby Define related to buy us and ethical decision-making. For example, if self-driving cars are killing less people in our simulations are simulations, then then that's a good thing and we can trust that system. So there are a lot of country is in a lot of different views are on explainability bias and furnace for sure. This is really explainability. It's interesting. The traitor that are made me different applications have different needs and they'll decide which way to move that lever. But I'm thinking about explainability and quite often. I've read data sets that come in because of the data itself over here and Lie by as if you and maybe it's not fair to call buys. I'm looking for a week. It just came up on an earlier podcast a year ago, but I'm looking for a track record of successful.

CEOs of startups and the data I'm going to get will be white males or there is Medical Data that's out there looking at heart attack victims and it's often leaving out minorities are women crazy day today currently have so how would you go about when you talk about explainability and explaining what you did? How do you explain that? You recognize that? Maybe the date is not right, but I want to find successful women CEOs are predicted to healthcare results for a for a minority Community. For example, how do you make sure you do that when you were obviously you're being transparent you're explaining it, but you still got to be unbiased.

You can start with the D lights off and run some statistical methods and trust her statistical intuition. As you started the podcast OS tell me about it supposed to go intuition make sure that that data is as I mentioned earlier inclusive and diapers but a lot of times you may miss some biases that are present in jail. We just simply though using those basics of the school tool. So at the end of the day, you'll have to run a lot of tests and experiments after the fact and Define metrics and Define procedures to make sure that that the result is unbiased which tells you to some degree that a data was on bias because these two are always highly associate with one another by a state of often results and bias or biased outcome from your machine learning Pipeline and I can give you another very interesting example with the introduction of

deep learning and with extra deep learning becoming

How to taking over the world of a I came a very interesting application of Norma oryx in the context of natural language processing called War which allows you to look at how humans formulate sentences and language and looks at koi Karen's of different warts and build mathematical representations of different words in our language. And once you pass your Corpus your collection of documents to this algorithm and get those mathematical representations box researchers showed that these representations are capable of kind of capturing syntactic and semantic of relationship so different words, for example, the relationship between the war man and woman was the same as the relationship between the words king and queen

And these Ward representations became the underlying features for a lot of other natural language processing tasks such as a translators in the later on after running a lot of tests people realize that it's simply amplifying to buy us is in the documents which back to the algorithm relationship between the woman the word woman and Homemaker was the same as a relationship between man and computer programming commands Witcher Austin news articles because they're publicly available and clean tax that can easily be used as as the training data. So it's just about running a lot of different tests.

Systemically based on pre-specified metrics at the end of the day. I would shut light on whether the data is bias or not or whether or how we need to be by soon. I will go back to the NHL record about domain knowledge. Then the team the data science team shouldn't just be programming jockeys, which I don't know his gender. Now. It's not gender programmers. You need the people that really understand that end user or even sociologist. So you need different types of skill sets on that team too. As you run these test cases again and again to see what's coming out of it. That's very true. What's another example of a large Company B to C company and some of the top says he is I want to improve my customer satisfaction.

Well now you've got gobs of data like just staying tactical gobs of data and you throw it all in you think about it before you throw it in that what's relevant. Do you think about the output? So if you've got just a general problem like that, but it's a serious problem. I really want to improve customer satisfaction. How do you go about that and think about the creation of a machine learning in artificial intelligence system focus on on that problem.

Sing after being in the troll because industrial of Microsoft I get to work was a lot of different customers from a wide range of Industry articles and you can see it almost is capturing could exchange it out when it comes to the user interfaces and interactions between the user in the back side are on order online platform Derek capture and get you have the right infrastructure to capture that cook a stream data what it is such an underutilized resource when it comes to mining inside from Dad data and understanding the reasons why people might be happy with certain features on that you are might be unhappy with their experience on that you lie. And so we got involved with one of our customers where we were given terabytes of Quicken stream data and the right tools and resources to do understand.

And how we can improve user experience extreme both on sale browser and a mobile phone. It would be would be multiple sources of event put this clickstream would come across all devices, correct another piece of an outdated analyze right. Clear cache and data for policy online platform animal white platform and they started analyzing them independently and building machine learning models to really model that user behavior and understand what makes our user experience interesting and appealing and what is it that they don't like about that you are and what we ended up doing was to adopt natural language processing methods to click a stream data because if you really think about it natural language processing

Methods are developed based on the idea. That language is a sequence of tokens tokens being the words or the characters depending on the task at hand and we use the same idea to click a stream is simply a sequence of actions, which are the atomic behaviors are diatomic contractions of the user with that interface. The very same as in the language of that Wars are the atomic units of our language interesting individual click. It's the sequence of clicks right is that it's a sequence of clicks that make up that entire experience and me to start adopting a lot of different natural language processing technician starting from the traditional and all the way to the estate of the art of deep learning to see an experiment what which of these methods are really capable of giving us what we want from that Cricket stream data.

But I should really really good results because we did also have some survey scores that from surveys that people feel down after they were done with her online platform with the pot farm experience in end session and given that labels data ever able to see that the idea actually was really really working and gave us amazing results. I'm glad to hear that you use the survey cuz one of my frustrations happens to be a bank of mine in the US and I really hate the UI. I feel like I click three times and I should be cooking one time and I am often just saying I like this wine is programmer, but everyone else it would you like to take a survey and then I say why am I cooking three times? So that's almost because if enough people say that in the surveys, then you can go back and find that problem in the quick stream, but you wouldn't necessarily know it was a problem that I click three times.

You comparing it to another system where happier people are going through the same action and maybe clicking one time rap something like that.

Comments you provide is also the experience itself what the stream data that we can capture and you do provide Commons. We did have the comments that users actually did provide in the survey as well alongside that question data and the score to give to that experience. So all of these data pieces together gave us this comprehensive view of the user experience and allowed us to really really understand the reasons behind low and high experiences for school. So it's just kind of works for the end of this thought about the data that we take into the system. Like you said for the bus system you recognizing GPS in traffic and weather and we talked about kind of thinking about the clickstream data as another source input how to analyze that's an hour. So I want to kind of get to the endpoint and really do a little stronger discussion the validation and there was another project another Microsoft project that had to do with predicting prediction of

Opioid issues and the data that was being front of the system was quite extensive know they were looking at recovery homes. They were looking at issues of overdoses and deaths within a certain distance from these home began to look at payments and crime rates in is it is amazing amount of of data that you processed at the end. How do you get to the level of comfort that you got a predictive model that you can kind of pass on to your customers and say we think we've got something that we are relatively comfortable with how do you get to the end state?

To make sure that our model is ready to go into production and it's fully validated. We do have some performance metrics and conventional performance metrics that are defined in the field that will allow you to measure your confidence and a model in terms of its accuracy. For example, what we need performance metrics that go beyond just accuracy of the model. We need to ensure that our model is fair and unbiased is discussed earlier, but you need to have explainable ai processes in place. And we do have those metrics and measures of trust in the model that can minimize or are the lowest risk associated with stop model projecting a particular example predicting someone's user experience. And this is a less sensitive use case. It is even more important for more sense.

Use cases, for example in the legal services de Menor in a financial services domain when we want to prove someone's mortgage or decline someone's mortgage or credit card based on AI and ml model really really need to spend time on that explain ability and fairness part of the process for which will definitely need domain knowledge is as you thought of it earlier and the final piece depending on where and when that model is being built as going to go into production be really really need to ensure that is compliant with all of the regulations that are in place at the time which is the final piece of of a before it can go into production and later on again. We just don't leave the model and production and never touch it again. We do have that cycle and that Loop that gives us back some feedback from the model that we can always improve.

The model face off as a very powerful answer if I think about what you just said in terms of not just get the performance metrics, but understand the transparency in the explainability and that you can communicate the elements of buy a season to keep working and improving. That's the kind of things that your customers must love to hear solterra. That was a great discussion and a great answer. Thank you so much for being with us today. Thank you very much and thank you for the opportunity.

DESCRIPTION

“If you want to be a good data scientist, you should spend ~49% of your time developing your statistical intuition (i.e. how to ask good questions of the data), and ~49% of your time on domain knowledge (improving overall understanding of your field). Only ~2% on methods per se.” Nate Silver, a statistician and writer who analyzes sports, elections and more.

In this week’s podcast Jon Prial is joined by Tara Khazaei, Chief Data Scientist, National AI Team, Customer Success Unit at Microsoft. Jon and Tara talk about how domain knowledge, as well as statistical intuition, make for more successful outcomes in machine learning projects. They discuss performance through the lens of projects Tara and her team have led at Microsoft.

In this episode you’ll hear:

Why you need enough, high-quality data
The importance of iterating and validating your approach to achieve the best performance
The challenges bias and explainability pose to ML projects
Why domain knowledge is crucial for successful outcomes
How to decide when your model is ready to go into production and why you need to go beyond accuracy

Who is Taraneh Khazaei?

Taraneh Khazaei is Chief Data Scientist on the National AI Team, Customer Success Unit at Microsoft. In this role, she advises Microsofts clients on how to adopt machine learning. Working with clients, Tara has researched the state of the art of speech to text methods and technologies, developed deep sequential modeling methods (e.g., use of embeddings, RNNs, and transformer networks) on terabytes of clickstream data to model and predict user online behavior and designed and developed an ML pipeline to predict the market price of a vehicle.