Epsiode 93: Facial Recognition, Demographic Analysis & More with Timnit Gebru
Jon Prial: Whether overt or unintentional, whether human or technology- oriented, bias is something that every company must be vigilant about. While it used to be something you might have to worry about with your employees, today, it could be equally pervasive and problematic in the algorithms those employees create. For a real- life example, you don't have to look any further than online lending. Researchers at the University of California recently discovered that online lenders' algorithmic credit scoring, using big data, regularly discriminates against Black and Latino borrowers by charging them higher interest rates. This is why trust has become such an important focal point here at Georgian partners. The examples of bias in AI are numerous. One of the more prominent areas where we've seen it happen in recent years is in image processing. Today, we're lucky to have Timnit Gebru with us to talk about that. Timnit is a research scientist in the ethical AI team at Google AI, and a co- founder of the group, Black in AI, where she works to both increase diversity in the field and reduce the impact of racial bias in data. She was previously a postdoctoral researcher at the FATE group; that's Fairness, Accountability, Transparency, and Ethics at Microsoft Research in New York. She has some fantastic insights to share, so stick around. I'm Jon Prial and welcome to the Georgian Impact Podcast. Timnit, what a pleasure to have you. Welcome to the show.
Timnit Gebru: Thank you for having me.
Jon Prial: So, I've read about this. I think everybody read about this. This goes back to mid- 2015, and the crisis of the day was that Google image recognition was identifying African- Americans as gorillas, and it just went wild. Now, we understand sort of what's coming, but why don't you tell us really what was happening and what went on with that?
Timnit Gebru: So, a couple of people, well, I think it was one person, tweeted about the fact that Google Photos was identifying pictures of... I forget whether it was just his girlfriend or him. His girlfriend as gorillas. So, later on, an engineer who used to be at Google, Jonathan, he wrote a blog post explaining what happened, and basically that a lot of black people were being, many times, misidentified as gorillas, but also a lot of white people were also being misidentified as whales.
Jon Prial: I didn't know that.
Timnit Gebru: Yeah. But what's really important to note here is that we have to pay attention to the types of errors our models are making, and not just that the accuracy. So, you can have 99% accuracy, but even on Black people, if you mistake them for a different person or if you mistake them for, I don't know, some other thing, it might be okay, but gorillas, anybody who has a societal understanding of what black people have been through, would know that it is associated with many racist remarks. So, there's remarks of comparing Obama to monkeys, or I mean, Michelle Obama and monkeys, and things like this. And this is a longstanding thing that has happened in the United States. So, I think for me, this highlights that we have to pay attention to the types of errors we're making.
Jon Prial: Right. So, there's really two pieces of the type of error. One is just an error that turns out to be egregious, the other error is if I can classify and get a 98% accuracy, and this goes back to your gendershades. org work, if I get a very high accuracy on lighter skin males, and a much lower accuracy on darker skin females, that's a different type of error rate, but that would also cause some serious challenges in terms of what I'm trying to do with this system, right?
Timnit Gebru: Yes, exactly. So, there's two pieces of what I'm trying to say. One is; if you have a particular data set, and you test whatever model you train on that dataset, and you say you got 99% accuracy, you want to say where that 99% accuracy comes from and where that error comes from. Is it that it is completely accurate in certain segments of the population and completely inaccurate in other segments of the population? So, this shows the importance of dis- aggregating these accuracy numbers by subgroups and also having more representative benchmarks that are not completely homogeneous. The second point I wanted to make was that, even if you are 99% accurate on darker- skinned females, the types of errors you make on that 1% is very important. So, the fact that the error was mistaking people for gorillas, specifically, black people for gorillas, versus something else, that's very significant because of the historical ways in which black people have been talked about, and that societal ways. And that's why it's really important to have these kinds of understandings. It's not just about having high accuracy, it's also about being sensitive to the types of errors you might be making and how that could affect people.
Jon Prial: And having that error, going back to the training... So, this could be my naivety. I'm just sort of guessing here in terms of a solution, but there was a lot of work with Google image recognition, and the very first, in my mind, good example of supervised learning was they took animals and they organized them by cats and dogs, and then we just need a human to label; those 10 million were cats, those 10 million were dogs, and they figured it out. Was it that they didn't have a good training dataset to get started, or is that not quite the right way to look at this?
Timnit Gebru: So, that's interesting. Because when the Gender Shades work that you referenced, Joy created a more representative parliamentary benchmark, and that's where we tested the different commercial gender classification systems, to see how well they did for people of different skin tones. And we saw that many of the publicly available training datasets had overwhelmingly lighter- skinned people and overwhelmingly male, but in this Google case, it's interesting because, if I remember correctly, Jonathan's blog posts was talking about the fact that they made equally high error rates for white people too. It's just that the errors that were done for white people were not as egregious and as offensive. So, that's where I think this is different, but of a equally important issue.
Jon Prial: Interesting. We use the example; if I recommend a different product that you might want to buy, and you don't want it, no big deal. If I'm doing cancer diagnostic, that's quite high- stakes. This is another lens into that, which is a different type of repercussions. But once again, if you don't get it right, the ramifications to a business that gets it wrong is significant.
Timnit Gebru: Yes, exactly. That's exactly what I was trying to say.
Jon Prial: Now, you mentioned Joy, and she was a research partner. She's out of MIT. Tell us a little bit about Joy and the work you guys were doing.
Timnit Gebru: Joy Buolamwini is a graduate student at MIT Media Lab. And I emailed her in 2015, I believe, because I was so interested in the work that she was doing. And so, by then, she had already started to talk about the types of biases that she was seeing in open source face recognition assistance. So, she was just working in some art project and she was trying to have this OpenCV face detection library detect her face. And she found that it wouldn't detect her face. It would detect her friend's face, and then she eventually realized that every time she puts on a white mask, then it would detect her face. I think it was a very effective way to show people what kinds of issues we were dealing with. I think this is what really makes Joy unique; is that she's not only a great researcher, but she's also an artist and she's also a very good communicator. So, she has these really good ways of making a point and making sure that her work is very impactful. So, she was already working on this and I advised her on her master's thesis, which was taking a deep dive into this issue and systematically sort of quantifying the problems. So, it's one thing to say that this doesn't work on you or in a couple of other people, but how do you do a systematic study, gather a dataset, present the results to show that this is a systemic issue.
Jon Prial: And you referenced the parliamentarian. Was this where she was getting the faces from government officials?
Timnit Gebru: Yes. So, the question was; if we wanted to test how well these systems did on people of different skin tones and different genders. And so we just chose these two because we wanted to constrain the problem, but this is just to show that there is an issue. And so, the question was; since the data sets that were already available were not representative of the world, basically, how do we come up with a dataset that has a variation of skin tones and genders? And so, what she did was she went to first see which parliaments in the world have a high representation of women. And then out of those, pick the countries with two extremes and what we call the Fitzpatrick skin tone classification system. So, the darkest and the lightest to do a comparison. And then also, these photos of parliamentarians, they're taken under a very highly constrained scenario, so they're not very difficult, they don't have different poses, they're all looking at the camera. So you can extract out some of the other problems, just specifically to study the effect of skin tone and gender. Yes. And so, this data set was collected, and after that, we evaluated the results of three different commercial gender classification systems.
Jon Prial: So, I actually see this then as both a technical problem, both the data, the breadth of the data, which then would hopefully yield as unbiased algorithm, but also, a staffing problem and the need for a diverse staff who brings that mindset to this technical problem. So, what do you see is really the solution going forward? How should companies attack this issue of bias?
Timnit Gebru: I see the root of the problem being the lack of diversity in this entire field. And not just by race and gender because you can have people of a particular race or gender who still wouldn't work on this problem, or still wouldn't identify this problem. You need to have the people who are working on this kind of problem. But the thing is, imagine if face recognition was developed in the continent of Africa, then we wouldn't have face recognition systems that have really high error rates for darker- skinned people. So, at the root of the problem is, I think, the concentration of wealth, power, data sets, and companies, and who's working on artificial intelligence in general, and what problems are seen as important.
Jon Prial: Another question; how far do you think this technology should go? This things I'm reading now, I can't even tell if it's real or not, or it just becomes a meme, but somebody argued they could look at faces and determine sexual preference or understanding emotions. Clearly, we're at the cusp of perhaps more challenges that are going to drive you nuts.
Timnit Gebru: So, the problem is, if your conclusion was that given the profile image of someone, I can predict with X percent accuracy, whether they are gay or not, maybe, I might take that results, because, the way you present yourself if you're openly gay would be different from the way you present yourself if you're not openly gay. But what they're claiming is that it's basically... The overall conclusion that they're making is, say, I'm just walking around the street and I haven't told anybody of my sexual orientation, and some surveillance camera takes a picture of me. They're saying that you can, with high probability, determine my sexual orientation. This is not backed by the work that they did. And also, another thing that they say is that the work that they did, give evidence that the reason that their sexual orientation is... Like there's some hormonal theory or something like that, that determines someone's sexual orientation. Now, the danger here is they've done, in my opinion, pseudoscience, which is not rigorous, but sensational. So, it's making all the headlines and it's gotten all these papers, and also something that's dangerous. Not only have they done pseudoscience that's inaccurate, they've told now, some repressive governments that if you have some random classification system and it classifies me as gay, then that's probably accurate. It's such a combination of things that's awful.
Jon Prial: I've been reading about recently; is this the curse of dimensionality? Is that term inaudible for you? And maybe having you define is better than having me define it. You could always find correlations in a large enough dataset, and it's not likely to be true. You need to do more work, you needed to more data science, you need to prove it.
Timnit Gebru: It's not causality, basically. It's that if you told me there is an association between someone's profile picture and whether or not they were openly gay, I would probably believe that. There might be a difference in the way people present themselves in their profile pictures, depending on whether they're openly gay or not.
Jon Prial: It may be no different than someone saying,"Showing their profile picture walking with a baby in their hand to show I'm a family person." It's a similar thing. And whatever you put in a profile picture has lots of driving factors towards what you're choosing, right?
Timnit Gebru: Exactly. And so, when you come up with something with a conclusion that can harm so many people, you have to do very rigorous science. You can't just do find those patterns, associations in the data, and just come up with conclusions that are stretches. And I think the author, Michael Kaczynski, says that... I mean, one of the others, the most senior author, says that the reason he wanted to do this is he wanted to show the danger that this is possible. That's what he said. But maybe that's true, but the thing is that... I think what they did is much more dangerous, because, what you're telling them now is that you can accurately do this, which you can't even.
Jon Prial: Which you can't.
Timnit Gebru: So, say I walk around somewhere, and somebody's going to try to tell," You are homosexual and I can tell, and I'm going to arrest you for it." You know what I mean?
Jon Prial: Right. So, this issue of do no harm. You're exactly right. People hadn't really thought about the implications. Obviously, there are implications of getting a cancer diagnosis wrong versus a shopping reference. And this is even an equally powerful do no harm that's slightly orthogonal than a cancer diagnosis. So, we could go a whole podcast on what was done here, but I'd love to get your sense then. There was a piece of work that you were part of, in terms of Google Street View images of cars that got correlated with voting districts. And I'm not sure I really want to know what car a Republican drives or what a Democrat drives, but I'm nervous, as I read that, and hearing you now, it makes me more nervous, these seem like unhealthy data points in an already polarized society. So, what's your sense of that analysis?
Timnit Gebru: That's exactly sort of why I wanted to do a deeper dive into ethical considerations of artificial intelligence and bias and things like that, because, my work suffers from some of these things. It is very dangerous to conclude. So, one of the things, for example, we do is we show the correlation between crime rates and certain types of cars in a particular zip code. And if I went back to do this work again, I would have been very careful about that. Because, once again, we use historical crime rate data, and now, I don't want some police department to go and say," Hey, this zip code has a lot of vans, and so let's send more police there." What I'm very excited about in terms of that work is, some of its potential implications for gathering surveys in places that is very difficult to get data about constituents. So, for example, even in the US, census is a very cumbersome thing. It takes a lot of money, and even it's a very political and contentious matter because people use it for redistricting. And Danah Boyd, who is an expert in this, was telling me that, for example, the 2020 census is going to be very contentious because one of the things that is being done is under- counting Latino populations. With various questions that are being added and things like this to make sure that certain people don't answer them. But what if we could have an alternative way of guessing the population of a particular place using publicly available data, and we could compare that to the census that is gathered by the dataset.
Jon Prial: As a way to validate.
Timnit Gebru: Those kinds of applications, I would be excited about. So, the Canadian research institution, they are looking at the relationship between greenery and population health. So, not just that, but the problems that you might have because of pollution. And it's a long- term study, and so they're using Google Street View to do this. I'm involved in a project to quantify what's called spatial apartheid, in South Africa, using images. And so, those kinds of things, I'm excited about. And also, imagine doing census in places where it's very difficult and very time and labor- intensive and money. It takes a lot of resources to do surveys. So, what if I want to know which states have a lot of schools, a high density of schools very easily? What if I want to know... People do a lot of stuff about soil health. They use satellite imagery to analyze soil health. So, there are a lot of applications that I think would be very interesting and very impactful using this work, but then at the same time, like you mentioned, there are also applications that are kind of dangerous. So, do we want more political targeting? The people who were most interested in my work seemed to be banks and insurance companies. Then do I want them to use this information more?
Jon Prial: So, let me just close on this. I think this is really interesting, on the broader thinking you want us all to have, as we begin to leverage AI more and more in our solutions. So, I guess, giving everything we've talked about, what advice would you have for business leaders today? What do you want them to do as they go about their next project?
Timnit Gebru: I would like them to... So, I'm very happy that I have my hardware background. When I first started working in AI, sometimes I wished that I was one of these people who's been doing it since I was 15 or something like that, and I didn't know how my electrical engineering or analog circuit design, whatever, background would impact it. I worked at Apple for a number of years in product development, not as a PhD researcher. And really the number one thing that it impacts is my way of thinking of process, and what kind of process needs to be put in place when people are working on AI, when they're applying it to their domain. Right now, it seems like there's a rush, and everybody just wants to use machine learning for everything. And so, one is, is it appropriate for you, or do you need to use machine learning for this inaudible? And secondly, if you're going to use it, what process do you have in place to do extensive testing, to check for biases, and then even once you deploy it, to continuously monitor it and make changes as necessary. And just don't be in a rush to deploy something, and also provide a lot of documentation, especially if you're going to have APIs. Because, the main thing we noticed in the Gender Shades work was that none of these commercial APIs had any guidelines for use cases or how to interpret the results that you had, and what kinds of scenarios you should use these things for and you should not use these things for. And so, I encourage all commercial entities to provide lots of documentation and guidelines for their users.
Jon Prial: Perfect. Timnit Gebru what fantastic points to leave our audience with. This was such an incredibly fascinating discussion. You've actually helped me and all of us, I think, think a little differently about the impact of the work that we do, and taking our blinders off in terms of really getting this right. So, I just want to thank you so much for taking the time to be with us. It's been a pleasure.
Timnit Gebru: Thank you.
Whether overt or unintentional, whether human- or technology-oriented, bias is something that every company must be vigilant about. And while it used to be something you might have to worry about with your employees, today it can be equally pervasive — and problematic — in the algorithms those employees create and the data they use.
Although the examples of bias in AI are numerous, one of the more prominent areas where we’ve seen it happen in recent years is in image processing. In this episode of the Impact Podcast, Jon Prial talks with Timnit Gebru a research scientist in the Ethical AI team at Google AI and a co-founder of the group Black in AI about some of the challenges with using facial recognition technology.
Who is Timnit Gebru?
Timnit Gebru is a research scientist in the Ethical AI team at Google AI. Prior to that, she was a postdoctoral researcher in the Fairness Accountability Transparency and Ethics (FATE) group at Microsoft Research, New York. She earned her PhD from the Stanford Artificial Intelligence Laboratory, studying computer vision under Fei-Fei Li. Her main research interest is in data mining large-scale, publicly available images to gain sociological insight, and working on computer vision problems that arise as a result, including fine-grained image recognition, scalable annotation of images, and domain adaptation. She is currently studying the ethical considerations underlying any data mining project, and methods of auditing and mitigating bias in sociotechnical systems. The New York Times, MIT Tech Review, and others have recently covered her work. As a cofounder of the group Black in AI, she works to both increase diversity in the field and reduce the negative impacts of racial bias in training data used for human-centric machine learning models.