Episode 78: Getting the Bias Out with Cathy O'Neil

Media Thumbnail
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Episode 78: Getting the Bias Out with Cathy O'Neil. The summary for this episode is: We all have our own personal biases. The question is how do you keep them out of your data so that you can create better products for your customers? In this episode of the Impact Podcast, we welcome renowned mathematician and author Cathy O’Neil to the show. Cathy has long called for an end of conditional trust in big data, and shares her views on why data is so vulnerable to bias, the massive implications this can have and what your business needs to do to avoid it. You’ll learn about: - How personal bias affects data - Developing a data strategy to avoid bias - How to communicate bias to end users - Creating a data Bill of Rights - The risks to companies of having biased algorithms Access the show notes here: http://bit.ly/2lzzD5m
Warning: This transcript was created using AI and will contain several inaccuracies.

Jocassee are you a coffee person?

I am like 100% recovery person. What's your favorite coffee place?

Probably ploughshares had like 100 5th and Broadway, so it's probably good that it is in the Starbucks. But have you ever been had 911 called on you when you were in your coffee shop now? I have not been well and you've been living in New York City for a while. I have you ever been stopped and frisked. Have I ever been now and not even close?

This is a pretty unusual and even a bit of an invasive way. Did you do suggest but there is a method to my madness I promised and I'll explain it in just a second today. I'm thrilled to welcome Kathy O'Neill to the impact podcast for those of you who don't know her Kathy is a renowned mathematician is the author of The highly successful blog math, babe. Org as well. As a number of books on data science, including the highly-acclaimed weapons of mass destruction as math in addition. She's a former host of the Slate money podcast at a regular columnist for Bloomberg view. We all love data especially here podcast brings to the table are powerful arguments calling to the end of unconditional trust in Big Data. In fact, if you haven't seen it already go check on her recent Ted Talk called the era of blind faith in Big Data must end. It's great.

Today on the show. We're going to be talking about personal bias the bias. It's trapped in our data and how the two are related then we'll get into how companies need to be focused to avoid it and provide a better product to their customers are getting back to my earlier question. You'll remember that not that long ago Starbucks made headlines when a manager at one of its Philadelphia locations at two black men arrested after they refuse to buy coffee or leave the store without getting into the details. I think this boils down to a clear and highly public example of personal bias. I didn't ask about the now-defunct stop the first program a highly controversial policing practice that used to be conducted in New York City and it's Kathy calls out of weapons of math destruction. It's a great example of the tremendous by is that can be hidden within data. So what these two simple example of a biased let's get on with the show Kathy O'Neill it is a pleasure to have you here.

Examples, I'd love to talk about this the sources of bias and although the bias in the Starbucks example could be considered the experience of the manager there. Is it fair to say that much of the potential buyers that we should be concerned about is coming from data. Well, I mean I take from people right and people are the ones that decide what kind of data to collect and what kind of day today it is. So, yes, it eventually gets to the data and then you could say that it's coming from the date itself. But of course it it starts with people and should you think we have a history of collecting data that's inherently biased and I kind of think about the black boxes Within credit scoring about powerless people and not so much about powerful people. So we have you no data about Facebook users and we have data about anybody wants to apply for a job or get a loan. We have a much less data about how people

Behave or I'll behave so it's it's that kind of power relationship that I think we should take about like and that's why we do have like disclosure laws about things like political contributions of stuff. I mean that that has to be a law right? There's nobody is going to be accidentally collecting information about how rich people's money works. But we do have sort of lots and lots of surveillance on for example people living in housing complexes in like in projects. So if you think about us surveillance work, she should have realized that it's is very class oriented and then as we get more more of this data and it's already here the inherent bias in the day that we're collecting as we get more sophisticated machine learning and AI systems does just make things worse, especially I may think they should have crucial ingredient in what makes it worse is that we trust the system so much. So if I had one thing to do one thing to accomplish it would be that I

I want to serve Shake are blank face in the algorithms that we build because as we've just described it like it's coming from a bias place. So how could it possibly become fair or objectives? Just simply because it's been processed by machines don't have any prior assumptions, but they can only do so much given me some of the ingredients in terms of the raw data. So what they do is they end up propagating past practices, if you know, depending on what what kind of system we're talkin about if it's a system decides who deserves a good job in the past and that will soon be defined by like who got the job in the past that raises who got promotions who got, you know who got fired who left early those people will look like failures even if it wasn't them that fail but it was the culture of the company that failed but a computer's not going to know the difference between those two things are the computer will essentially propagate the past unless we ask for it to do more.

So far, we haven't still a long way of saying the scary thing to me. Is it the algorithm itself? It's the power that we confer to the algorithm because we trust it too much, just coming to Quick topics then both trying to the Dayton that we communicated. So sorry about the date a little bit where should companies go or maybe how do they get started with a data strategy to overtly avoid creating some buy a gathering data will I mean, I actually I want to be really practical because I'm not I'm not stir like an abstract philosopher at this is like, I really am just I'm actually worried that people are using algorithms to do things illegally in a regulated space like credit are hiring or anything about HR or insurance. If so, is it legal are we doing it legally or we discriminating? Illegally and I think that's the very first question that you should be asking and answering

Are there sufficient laws now that are on the books that you're comfortable with? Absolutely. I mean, that's the funny thing about it is like we definitely need more laws for the new Industries like being a political online advertising with us even have the same kind of disclosure laws that the TV advertising has like we haven't caught up with those laws of brother James. We have perfectly good luck by the federal Regulators could because honestly the federal Regulators do not know how to decipher the algorithm problem an obstacle for them. We have we have plenty of good laws. They just are being forced into sex. So I understand the political in the regulated environment, but we're also want to make sure even e-commerce companies or healthcare companies only have laws in healthcare, but please other companies are not as in a regulated World. They should also be ensuring that the date of their collecting is a bring bias for a medical bill making a medical recommendation or making

Placing an advertisement. How do we make sure again? The biases are more universally avoided yeah, that's a really good point. I mean not everything is regulated and some things could be unfair or even very destructive and not break. Any laws. I think and in the here I'm going to come back to like don't be intimidated by the algorithm. There's nothing inherently. Perfect about an algorithm is the process that has been automated. So I think the easy answer to that is asked the great questions. So for example, there's a long history of medical testing being done much more on white men. Turn on white women or black men or black women and so is your relying on testing that's historical you might have that kind of blind spot. And so if you're worried about that kind of thing, then what you should actually do is a scientific testing framework build it so that you can see whether it's working as well for different groups of people. The short answer is put the science into data science.

Just asked a question set up a scientific test and run them asking questions or CAA or O'Neal auditing is interesting Sonia risk Consulting algorithmic auditing. How do you help companies than get through this offer a service to audit for the kind of thing. We talked about discrimination or what-have-you or fairness but also just for accuracy for transparency for whatever. They actually are worried about soy is this algorithm doing what you wanted to do. I also offer a service which is like basically corporate training so that I could teach you how to think about algorithms. Well as well so I could teach you to be out of your own algorithm or I could teach you to think about whether the algorithm that you already have are accomplishing what you want them to be doing. I feel like there's been just such a law.

The wave of hype around 8. We have a lot of companies that are not in their data and their data science, but they're not really sure that they're getting what they were expecting to get out of that whole process. So that's part of what does associative like larger on it as an audit of like are you doing the data science that you thought you were going to get done here? I mean you'd be surprised how often the data science team is answering the wrong question that the business the business team actually is interested in a totally different definition of success and to optimize to a different definition of success by offering a communication bridge between the date of people and the business people.

I was thinking about your audience because just Riesling a Starbucks is closed and a training older people that is one element of corporate training and I just heard you talk about data scientist in business people. So when you do your training are they in the room at the same time, is it a pleasant get together sometimes a little bit tough, but it's the questions that are the most basic questions and I was trained by as a mathematician and what you get from that the training is absolute inability to be ashamed to ask a dumb question. Like that's what we do in math question our assumptions at all times. We always make sure they're all on the same page with what the assumptions are because if you aren't on the same page that is absolutely meaningless to go to the next step. And so that's the first thing you learn as message to the second thing, which I also bring to these corporate trainings is up. It's good to find out you're wrong that is actually a big step forward if you find out you're wrong.

Then you have figured out how not to waste time. So I go straight for that. I go straight for the dumb questions at finding out whether we're doing it wrong because that's the fastest way to make progress L. Georgia does evolved in terms of taking some of the Consultants work that we're doing around in this case differential privacy building products, but as well now you mentioned that Orca which is clearly a consultancy and it's all about what's in your brain and your team's brains, but you're also working products to tell me a little more about that really gets a different but yet and I'm part of it's because we haven't really spread the message that algorithms could be doing nothing, but the eventual thing is going to have kind of a subscription service that will audit your algorithms on as our daily or weekly basis to make sure they're behaving well to monitor them. I would have ideally different modules.

Say you're doing algorithmic trading. I would have a module that make sure that you were complaining that your algorithmic trading system is complying with all the SEC rules and I might have a separate type of module that would check to make sure that if you're working within the context of Europe that your algorithm is working with him at gdpr rules and laws. So you have different kinds of concerns. So you would have different kinds of monitoring system. So that that's a product that I hope to have up and running and if next year's eve evolved I was a companies are going to be doing a better job. We all hope for communication strategy. What do you think companies should be telling their users about not having buyer's or finished out? How much did they declare what you want them to talk to their end users? We've checked the rooms are legal and that are fair and it's their definition of fair. And these are the laws that we are complying with and here's how we know. We are complying just how we know it's working. I'm right now we kind of have an evidence-free. I called the era.

Puzzle deniability with algorithms. We just say, oh, it's an algorithm therefore it's perfect. And I think in the future the end users are going to demand more than that makes sense mid-april. You had a really cool Bloomberg view article tell me about your view and I love the term the date of Bill of Rights. Right? So I'm pushing for this basically two ideas. I should say and one of them is based Loosely on the rights. We already have a r n r FICO credit scores to the right to our credit report, which is essentially all the data that goes into a credit score as we are allowed to look at that regularly. We're allowed to complain if there's errors or like to appeal and complain. I think we should be able to do that for any kind of scoring system. Not just a credit score but any scoring system that is either allowing us or denying us our right to a very important opportunity when I say important opportunity. I don't want to like everything in the world is I have to narrow it down into really truly important things around Liberty. So like actually getting in prison for

Ziplink restoration of screen systems around that or your livelihood. So whether you get a job with you can keep your job. There are scoring systems that allow people to get fired or to be denied tenure for teachers or not. You get a job interview. Those are all things that you should have fuse into or if they run your finances. So if you get a credit card or a loan or an insurance or things like that, so that's the first part is like I want more of you into the scoring system so much to our lives. The second thing is modeled more like around the FDA for the fca's job is to make sure that when a pharmaceutical company wants to bring a new medicine to Market but it is number one helping people and not killing people actually never one not hurting people to actually he is sort of medically useful. So it's actually improving on improving people's outcomes and then they have to provide evidence that that's the case before. It is brought to Market on a massive scale.

When is that? We should do the same kind of thing for algorithms that could possibly destroy people's lives if their unfair or or week if they have flaws do we have all these out there that are potentially it's not ruining people's lives, not killing them but at least destroying opportunities that are meaningful for people like getting a job or going to prison should have to pass some tasks buy a regulator out of the start of the FDA level before they're allowed to be brought out and scale to the public. That's so strong. I love that. You coined the phrase the party the inscrutable at you actually been all over that in this podcast interns. It's an alligator that was there from much work and I you called it an undeserved authority. Guess that's not really being questioned at this point and the Heart of my favorite non-fiction book of the past couple years, which is your weapons of mass destruction matters first what's going on with people and social media or what's going on with governments?

And what they're doing in the obvious. You're both impacting promised you I'd love to just get your take on two different challenge by governments government meaning NSA surveillance Facebook surveillance as much as other people do I feel like once once we have distributor our data to Facebook then it gets it gets hacked by Russian. What's a different kinds of third parties Cambridge analytica out of probably hundreds and then once it's out there it gets disseminated to Big Data warehousing companies like Axiom and then for that matter the CIA the NSA all those companies can just buy that information as profiles on that matter. There's there's a little bit of collusion going on there that like the government spying agencies. Probably just don't want to really bring down that social media.

Where can I really just social media but like the entire data warehousing that work because then they won't have as much access as I know. I do think of its kind of a lost cause and that's why I am with fixated much more on the way with our data is used not the way that I did is collected. I think I'll get is already being massively collected. So the question for me, I think that we focus on is, how are we allowing people to use this against me? Some of the cats out of the bag was controlled exactly the risks and legal 1st reputational risk. It all companies are facing in this usage of data without really communicating well to their customers Mississippi reputation, but there's plenty of companies that build algorithms that are probably flawed that don't have direct consumer consumers to worry about in that case.

What I would hope for is that these not enforced but important regulations that already exists will eventually represent enough leverage for the companies to second-guess their their effects. There's an entire industry around Insurance of a third-party Big Data companies that are trying to sell these three licensing agreements. So these the scoring systems and I think the insurance companies are right now there where they're buying them up because they seem to increase efficiency. And so the question is are they legal and I did the insurance companies will have to start asking that question much more than in the coming years Jesse O'Neill a pleasure. Thank you so much for taking the time to be great. Thanks for having me.

Thanks for listening if you like what we doing, we'd appreciate your telling other people about the show better yet. Give us a 5-star rating on iTunes or write a review doing so will really help ensure the more people can find us and if you haven't already please subscribe to the impact podcast on iTunes, SoundCloud or have you go to find your podcast.


We all have our own personal biases. The question is how do you keep them out of your data so that you can create better products for your customers? In this episode of the Impact Podcast, we welcome renowned mathematician and author Cathy O’Neil to the show. Cathy has long called for an end of conditional trust in big data, and shares her views on why data is so vulnerable to bias, the massive implications this can have and what your business needs to do to avoid it. You’ll learn about: - How personal bias affects data - Developing a data strategy to avoid bias - How to communicate bias to end users - Creating a data Bill of Rights - The risks to companies of having biased algorithms Access the show notes here: http://bit.ly/2lzzD5m