Episode 87: Trust and the Data Scientist's Dilemma
John Pryor: Is it unethical if your company doesn't do its due diligence on avoiding bias? That's an interesting question. One that my guest today is very qualified to answer. Today, I'm going to be talking with Catherine Hume, one of our most popular guests. Catherine is the VP of product and strategy at integrate. ai, which is one of Georgian Partners Portfolio Companies. And she excels at bringing a depth of insight along with clear and simple explanations of topics that are hard to understand. She's passionate about building companies and products that unlock the commercial value of emerging technologies. So it's not surprising that she's also a venture partner at FFPC, a seed and early stage technology venture capital firm. She's a regular public speaker on how AI and machine learning technologies work. And she's given lectures and taught courses on the intersections of technology, ethics, law, and society at the Harvard business school, Stanford, MIT Media Lab, and for the university of Calgary, faculty of law. It's going to be a great discussion so stay tuned. I'm John Pryor and welcome to the Georgian Impact Podcast. So thanks for being with us, Catherine.
Catherine Hume: Thanks for having me.
John Pryor: Actually want to talk about integrate and how your company has been establishing itself as a leader around what you've been calling responsible AI. It addresses privacy, security, ethics, and more, and I'm really looking forward to talking to you about this. Well, I'm delighted to be here to dig into it with you.
Catherine Hume: So, integrate is a B- to- B- to- C company. You start with a company's data, you augment it with third party data and you do propensity modeling. Now I don't want to be facetious here, but with the magic of AI, you really help your customers do a better job of understanding what their customers do to provide better service. So how'd I do, did I get it? Yeah, I think that's more or less right. I mean, it's funny because one of the trends that we're noticing with our large B2C enterprise customers, so we work with the retail banks of the world, the insurance companies, media companies, et cetera, is that the C- suite wants to become customer- centric. That's a relight motif we hear time and again. And often they imagine that as one- to- one personalization, which is sort of the dream, right? Right now you've got a big bucket and everybody's in the bucket and you sort of apply the same treatment to everybody. And then segmentation is slightly smaller buckets and the ideal of course is a bucket of one where every individual is treated as if they were sort of in this bespoke tailored service. What we like to say is, on the way there artificial intelligence is about optimization. And there is a lot of efficiencies that these consumer enterprises can gain from their marketing efforts if they were to allocate resources and spend on the people who could most benefit from an intervention of some sort. So the first step we say is, before we go all the way to one- to- one personalization, let's be a little bit smarter with who we're reaching out to so that you basically have higher profitability and margins on your marketing and customer engagement efforts.
John Pryor: Absolutely. So your targeting customers is critically important, obviously where we are today with collection of data. We presented a number of podcasts of episodes where we've talked about some potential pitfalls, which is bias in data, bias in the models, perhaps a lack of transparency in what a business tells the customer, which actually brings me to this paper that you guys have recently published. So why don't you help me understand integrate's vision and how this paper came about.
Catherine Hume: The paper came about because we're seeing right now that the C- suite at our B2C enterprise customers have a challenge. On the one hand, they know that they need to be applying new technologies like Artificial Intelligence in order to stay relevant in today's game. On the other hand, they know that there's a lot of risk associated with using personal customer data to improve their business, right? So they've got The Ghost of Cambridge Analytica at the back of their mind. And our goal was to help them make better choices and have better discussions and have the right conversations with the control functions, the data scientists, the technology teams, the business teams, so that they can be innovating, but innovating a way that upholds the values of their business.
John Pryor: So full disclosure, Catherine and integrate did not do this alone. You've got a lot of people you acknowledge in the paper. And I also want to mention, and thank you the Georgian Partners was one of the organizations you acknowledged in the paper. Now, this terms of kind of getting the focus in there and helping everybody across the board, do you see doing this right, merely as a differentiator for a company, or do you think we can get to the point where it could make or break a company?
Catherine Hume: What I've seen, just in working with our customers and being out in the market, is that they believe that maintaining and furthering the trust of their consumers is critical to their business today. So I think it's to the point where it's not just a competitive advantage, it's making or breaking. There's a shift in what trust means. So if you think about being a retail bank, historically consumers trusted the bank with their cash, right? I mean, you go to the bank, you put your money in your safety account and what you're trusting is that's not going to just disappear the next day. And of course, 2009 showed us that when all hell breaks loose, that can be difficult, but there's new types of trust these days, because now that data has become the new oil, right to sort of talk about it the way that a lot of newspapers talk about it. Consumers are now saying," Oh my God, I also have to trust you with this other thing, which is this virtual representation of myself. And there's potential harms that can be done that are more than just my cash goes away or more than just, are you actually giving me a good deal on my toilet paper this afternoon?" Right. So that new dimension of trust is making it so that businesses really do need to think about this to stay competitive and then stay vital.
John Pryor: So, I want to dig into the paper a bit. What I found most interesting, and I think perhaps maybe the most helpful piece of this, was that you set up a framework that allows companies to do this self- inspection and see how they stack up in terms of using AI responsibility. And you're actually breaking down the machine learning process into these discrete steps, shining a light on the different ways a company can go wrong or making big business decisions to do it right. So, what was your starting point as you thought through that framework?
Catherine Hume: Yeah. So for me, the first thing I thought was everybody's thinking about ethical AI. Everybody's keen to learn about responsible AI. Obviously the businesses are facing these pressures. As it stood, my impression was that there's a lot of talk and there's a lot of critical discussion. There's a lot of philosophical analysis. There's thinking about responsible AI, from the perspective of the market and job loss, but there wasn't a risk- based toolkit that was granular enough to localize decisions so the companies could make headway. So the first thing was," Okay, how do we break this down into small enough chunks and pieces so that decisions can be localized and analyzed and not so large that companies can't get traction."
John Pryor: Got it. Business decisions, that's step one.
Catherine Hume: The second motivation in breaking down the framework, the way that we did, was to also educate non- technical stakeholders in the business, and sometimes even technical on how the sausage gets made in a machine learning product. So, we wanted it to be at once sort of... these what machine learning projects actually are. There's so much talk about algorithms. There's so much sexiness in the research world related to the capabilities of a deep neural network. But what people don't realize is that the algorithm is one very small step in a much larger process of getting to value. That process includes figuring out what problem you're going to solve, figuring out if it's probably more solving, collecting data, analyze it, doing the preprocessing work, which is the foundation of all the machine learning projects and doesn't get enough credit because it doesn't feel sexy anymore. Right? There's... Then building out that model and then you have to integrate, you got to do something with it so that you can actually get a feedback loop going because machine learning model is an hypothesis. It's a guess at a moment in time. It's what you do with the guests that matters for the business. And there's so little focused on sort of the art in production machine learning that we wanted to level set people's expectations, which was another sort of goal for the paper.
John Pryor: For me what was most interesting is the first question in the first step that you asked. Was, how could your system negatively impact individuals who is most vulnerable and why? So, tell me more about your thinking around this particular question, your first one.
Catherine Hume: Yeah, for sure. So, I think when companies entertain a machine learning project, often the discussions that we have internally, we're thinking about things at the statistical level. We're thinking about large business goals we want to achieve, KPIs. And what we don't do is put ourselves in that empathetic imagined position where we're like, all right, who's actually going to be impacted by this system? What's it going to look like? What's it going to feel like? Who is this person? Is this person like me? Is this person different from me? And so that first question we find... And there's a section in the paper where it's like, who should be on the ethics committee, who should have a voice. If the voice is only coming from the perspective of say, the leadership, which as we know, tends to come from sort of a certain subclass of society, they're probably not going to be thinking about the impact that the product might have on somebody who's radically different from them. So, sort of the first step we wanted to say is behind all of the abstractions of transactional data and image data, and speech data are people, real life in the flesh people they have worries and anxieties and pains and hopes and all of the wonderful, messy things that make us human. Like these are the people, they're getting impacted by them. So, we think the first step is to... We actually call it a pre- mortem. So we sort of invert the standard practice in agile software development, where after you do something, you go back and you say, all right, post- mortem what worked, what didn't work? What do we tweak? When I say let's do this sort of imaginary exercise where we think about all the things that could have gone wrong before we start, and then include that into the things that we're thinking about when we design.
John Pryor: And you're also building some very interesting controls in this, as I kind of work through all these different, fascinating questions when I get towards the middle. I think one of the most provocative, and maybe one of the most important questions is that you ask is, how will you enable end users to control the use of their data? I mean, what's happening in the press right now, everybody thought that Google was allowing you to turn off your location data and perhaps the answer is no. And here you are being very overt about having users control their data. How consciously are you focusing on that?
Catherine Hume: It's a bloated question. I believe there is some incredible work right now to shift from the techniques of informed consent that we are all accustomed to. Company sends us our... We sign into... We activate some online app. As we do so we get this little legal notice that pops ups and says," Here are the terms of service, blah, blah, blah." Nobody reads them. We're supposed to click agree. We have to click agree if we want to participate in the economy and participate in the services that this product can offer us. But fundamentally, are we controlling our data through that mechanism? No. Right? We're not. And so there is some interesting things that I have seeing Facebook do recently where they're making privacy more embedded into the user experience. So I was about to post something recently and I had a little thing that popped up and it was like," Do you know that your settings are public for this? Do you want that? Is that cool with you? And so those kind of just ingrained things where as a condition of a use on a regular basis, you pop up these things. And sure, it might cause a little bit more friction for the user if they're about to go in and do something but if it's elegant and simple enough to the point where it does just take one minute, it makes it much more tangible and real. I think those are the kinds of controls that we're thinking about. On the flip side, there's this big debate going on right now. And it's something we really need to think critically about related to the viability of informed consent in a world where we're using machine learning products. So, can we meaningfully consent to the use of our data as it appears in aggregate statistical models that will go on to make predictions about all sorts of things. In my opinion, to meaningfully consent we have to also meaningfully understand. And I don't think there's a lot of people out there that will do that. And I don't think it's reasonable for us to expect that they will. So we either need to find a way to cleanly articulate what a profile to use the language from GDPR is. So what sort of a statistically derived feature about somebody is, or we need to shift from governing the use of personal data at the individual level to having it be much more robust controls at the processor company level. And you guys have done a lot of work on differential privacy. The cool thing is some of these tools are getting to the point where they're ready to be used at scale. And I think we're going to see a conceptual sort of mind shift over the next 10 years, where privacy becomes about the obligation to the community, as opposed to the right of the individual.
John Pryor: This is great. Because we're talking about individuals and now you've just kind of elevated to the broader view of all this data coming together. As let's do one more within your paper here, and you specifically ask, do you have a plan to monitor for poor performance on individuals or subgroups? And I see this at the heart of so many issues for, do you have the right database to do correct medical testing? How are you looking at, for example, recidivism comes up all the time or identifying individuals? It seems a no brainer to me yet. Companies are not quite doing it yet. So what's your view of how this gets aggregated and you begin to look for bias?
Catherine Hume: Well, I think ultimately it comes back to having the proper framework and set up for the life cycle of a machine learning product. So again, it goes back to," Sir, we build out this model, it's going to perform well at the middle of the distribution." Machine learning models love regularities and they hate edge cases. So, they love this stuff where there's a lot of similarity. And unfortunately, if you're an individual who falls into a class that has been underrepresented in the training data set, then the performance of the model probably won't be a sound on that class because it's akin to having never learned something. Say you're fluent in English and suddenly there's some texts that's presented to you in French. You're like," God, I don't know what to do with that one. I guess I'll just make a guess on what the answer might be." So, it's that kind of experience that if the model had a consciousness that's kind of what's thinking. It's like," I guess for this one, I'm 22% confident but I'm just going to say yes." So it becomes more or less random. So then you say, well, how do we mitigate the risk of making a bad prediction if we want to get going, we want to test and learn. We want to move things, move fast and break things. But how do we continue to mitigate those risks? So I think there's a, protocol that the Q& A team, the data scientists team, can use to test the model in a simulated environment prior to putting it in production to try to uncover some of these blips that they may not have imagined prior to building out the model. And then the second is once it is in production, monitoring the distribution of those results, identifying areas that where the model might not be performing as well quickly, and then having a team whose work doesn't stop with just making the model, that's where the work begins, right? And once it goes into production and then they got it, they have to watch, they have to monitor. If they look at the results they have to measure, they have to report, they have to update the model. And some of this can be done automatically using some of the advanced auto automatic machine learning toolkit to sort of shift around how the parameters are working, get new data, et cetera. But some of it will also require the values- oriented governance, I in mind of a production person on the team.
John Pryor: So let me step back and kind of drill into a very specific, and maybe this is kind of, in my mind, one of the more challenging questions in terms of how you end up working with your end users, whether it's you, integrate. ai, or any other company. Obviously you beginning to relate better to customers, you know more about them and therefore you could do a better job. You learn more with every interaction. But getting it right is a challenge. We often say, don't be creepy, but let me just say, is it fair to say you try to avoid coming across as invasive and is that the right work? And does this tie to your discussion of transparency?
Catherine Hume: Thats an interesting thing with personalization. So, I was recently with the CEO of a retail store that has books, a book retailer. And if I think about personalization when it's done well, it actually also aligns with the interests and values and needs of the customer. So, I'm a nerd. I like to read books. YI like to read a lot. I beat myself up because I worked too hard and I don't have enough time to read them as I used to. But precisely because of that, my book reading time is so precious. And I would love to have the comfort that I really am optimizing and I'm reading the stuff that will further my imaginative potential and my interest in my education at any given time and I'm not wasting my time reading junk. So if I'm in that realm as a customer who really also values the business that I'm working with, I love their product and I want them to help me have the best experience possible. That's one where the stakes... It's like negotiating a contract, right? The stakes are sort of on the table and I would think about it as an individual, where would I be comfortable letting you know about who I am and what I'd like, so that you can really help me optimize my reading. That's a great service.
John Pryor: Because then you're all in. I'll do whatever you want if you can help me get the right book and save my... I don't have to read all these reviews and makes me feel good.
Catherine Hume: Yeah. And I think then, so Helen Nissenbaum, who's an advisor for a company, good friend of mine. She has a notion of privacy she calls contextual integrity, which is basically... When we think about privacy, she calls it... It's less about control and more about appropriate flow. And those appropriate flows actually stem from the way in which we relate in our tangible social context in the physical world where there's all of these very fuzzy norms that govern what you say to whom and when, and what's right, and what's wrong and where you'd be shocked, you'd be like," Whoa, whoa, whoa. That's weird. Why does that person know that thing?" That's outside of the context of what I considered to be appropriate sharing of knowledge, information, et cetera. So, I think where creepiness comes up is when there's a break in the expectations of appropriate flow in context. And sometimes that can be, I hate to say it, but bad marketing. Right? When you're on your Facebook page, when you're on your Instagram feed... Instagram does a great job with ads. I have it disclosed my proclivities but man, maybe it's because I follow a lot of fancy hotels and design things but they show me decent clothing. I'm almost tempted by the ad sometimes. Where it's a poor user experience is where you get something that it just feels obnoxious. You've got stuff popping up that you don't want. So there's a mismatch between what the customer values and what is being shown to them and then it becomes obtrusive obnoxious. And of course this is less related to machine learning, this is marketing as a whole. There's also social manipulation where we want to craft tastes to get people to prey upon our poor lizard, dopamine, reptilian brains, where we were worn out at night. And we were totally susceptible to the images of those that are more beautiful than we are. And that's part of it.
John Pryor: I do like to bring in the marketing piece in as well in terms of how we and what you can do in terms of working with individuals. So, ethics could be a new slippery slope. I mean, prior to any computing systems, people were selling snake oil. Clearly, everyone I think would agree that's unethical. Now with AI, do we have a different slippery, ethical slope of companies don't focus on it may... Basically I'm asking, of a company considered itself ethical, and it's not doing some of the elements that you've been talking about here with me today, are they potentially still an ethical risk?
Catherine Hume: For me, we start off the framework with a set of guiding principles to provide what we consider to be the types of foundational intuitions that everybody on the team needs to have as they're embarking on the machine learning journey. The principles are different in kind than the standard maxims that you'll see in a lot of these frameworks that are like, do no evil, be fair. I find those to be useful at a certain level, but they don't necessarily develop concrete intuitions on sort of, what's unique about these tools. In our framework, our first Maxim is when you are embarking on a machine learning project, be careful to know that the number one axiom of using these tools is the assumption that the future will and should look like the past. And if that's not the case, which is often the case in the normative ethical sphere, you need to be aware of that and be designing for that. So, let me give an example. Say a business has historically done a great job serving 45- year old Caucasian males. It's going to have in its data set if it goes to say, okay, what's the propensity in my customers to act on my new product? It's going to have information from the past about the subsection of the market that it has historically well- served. But should it in the future, want to grow and tap into new markets? Relying upon a machine learning model might actually be a bad idea because it's going to sort of exploit the trends from the past and then perpetuate those into the future.
John Pryor: Are machine learning model based on just that data?
Catherine Hume: Exactly. Great qualification based on just that data. But these are the kinds of sort of things the companies need to be aware of. So then you say, okay, so is this a new type of ethical domain? The companies do we think about? Yeah, that's... I think it also disrupts our... or it challenges our assumptions of the futuristic notion and the futuristic nature of all technologies. So we think about, right now, AI and machine learning and blockchain and all the buzz words are sort of at the vanguard of the things that are supposed to be considered the most disrupting, the most futuristic, the coolest things, the things that are going to run the world. I just saw an article about machinery models around the world. The ethical stance there is one of critical judgment and thinking where you step back and say," Yes, but there's a risk that this becomes a force of conservativism in my business as opposed to disruption if I'm not mindful of the types of data that I'm using, the types of data I'm collecting, potential pitfalls for bias, et cetera."
John Pryor: Just to close, what kinds of practical financial or business advice can you offer companies that are looking to take a more ethical approach to their AI?
Catherine Hume: So I think, businesses as they think about responsible AI, they're likely will face obstacles where the business will say," But do we really need to be thinking about ethics? Is this really critical for us right now? Come on. We're under a ton of pressure we got to make our near term numbers." A team that is passionate about this can be armed with two weapons to help further the cause of responsible AI in their business. The first is that compliance exercises are the worst means of motivating people. No one wants to be told what to do. Nobody wants to have these things that slow them down and put brakes on things. The sort of the techniques of control and governance can be a dampener and a source of frustration for businesses. One way to galvanize the right activity in people is to actually have this be about people and ethics. People tend to be... They care about that. And so it's a way of injecting energy and fuel into what would otherwise be sort of a dry compliance exercise. The second weapon is that sometimes being fair is, as I just mentioned in the last comment, but I just want to articulate it cleanly. Sometimes being fair is the best source of new market development. So, as opposed to reframing this, not as," Oh my God, what are the risks? What if we harm this population this way?" You do want to be thinking about people's wellbeing, but there's a different emotional stance that can be helpful for the business teams if you say," Look, growth is slow. We're not going anywhere. We need to target this new segment that we haven't been able to target historically." And evaluating a model for bias sometimes can be the best means of identifying an untapped market. And then you got to do the creative work in figuring out how to address that market. But I think when it's reframed as a growth opportunity, again, that can be the sword, that somebody who's passionate about this can use to sort of push this into business.
John Pryor: Catherine this has been great. Thank you so much. It's always such a pleasure. So informative to talk with you. Look, we'll put a link to your paper in our show notes and I encourage everybody to get it and read it. It's fantastic. Thanks for listening. If you like what we're doing, we'd appreciate your telling other people about the show. Better yet, give us a five star rating on iTunes or write a review. Doing so will really help ensure that more people can find us. And if you haven't already, please subscribe to the Impact Podcast on iTunes, SoundCloud, or wherever you go to find your podcasts.
Is it unethical if your company doesn’t do it’s due diligence on avoiding bias? In this episode, Jon Prial talks with Kathryn Hume, the Vice President of Product and Strategy at integrate.ai. Kathryn and her team recently published a paper entitled “Responsible AI in Consumer Enterprise” which offers a framework to help organizations operationalize ethics, privacy and security as they apply machine learning and artificial intelligence. Jon and Kathryn discuss the paper and highlight some of its key takeaways for today’s businesses. You’ll Learn About - How ethics can make or break a company - How machine learning systems can negatively impact people - Using a framework to examine ethical issues as you build ML systems - Practical advice for being more ethical with AI Learn more by accessing the show notes: http://bit.ly/2ycJALW