Big Data, Big Problems


Mathematician Cathy O’Neil holds a Ph.D. in number theory from Harvard, and she taught at Barnard for years before leaving to work in finance. She took a job at the hedge fund D.E. Shaw in June 2007 but left the field soon after, embarrassed by the role bankers had played in the housing collapse. In 2011, she rebranded herself as a data scientist and went looking for a place where she could do math and feel good about it.

Before long, O’Neil landed a spot at an e-commerce startup, developing models to anticipate the behavior of visitors to travel websites. There, she was struck by the parallels she found between data science and the financial system that had just decimated the U.S. economy. “A false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops,” she writes in her new book, _Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy_.

_Weapons of Math Destruction_ was nominated for a 2016 National Book Award. Lately, I’ve been recommending it to everyone I know, as both a key to understanding the present and a tool to ward off an increasingly dystopian-seeming future.

**Sarah LaBrie** : What is big data?

**Cathy O’Neil** : Big data means different things to different people. For the sake of the book, I meant it as the marketing campaign around the newfangled uses for data. It’s a promise, and it’s a hope. The promise is that once you have big-data algorithms, they are inherently fair and objective, and they are better than human beings at being fair. That’s something that isn’t always explicitly stated.

Then there’s the hope, which is also often not stated, that we don’t need the actual data around the thing we’re trying to predict, that it’s good enough to use proxies for that thing.

**SL** : What is a weapon of math destruction?

**CO** : A weapon of math destruction is an algorithm that is important but secret at the same time, and also destroys people’s lives. It’s being used for important decisions, and it’s being used unfairly so that people’s lives get ruined, or at least they get injured.

An example that really got my attention early on was the teacher value-added model. You had teachers whose jobs were on the line through secret teacher assessments that no one could explain to them and that were statistically flawed, so the scores were inconsistent. I found a teacher who scored a six out of 100 one year, and a 96 out of 100 the next year, without changing his methodology of teaching. When teachers tried to appeal their scores, they were told the scores were too mathematically complicated for them to understand. They were told, “You’re not expert enough to question this.”

**SL** : You write about how poisonous assumptions can be camouflaged by big data algorithms. Can you talk more about that?

**CO** : Data is a digital echo of culture. If we consistently arrest blacks for smoking pot at four times the rate we arrest whites for smoking pot, even though whites and blacks smoke pot at the same rate, the data reflects that bias. We have many more arrest records for blacks under the crime of smoking pot than for whites. What we don’t acknowledge is that we haven’t successfully measured crime.

When we use these arrest records as if they’re a good proxy for the actual thing we’re trying to measure, then we end up developing tools like (1), which looks for locations of arrests and sends police back to those same places looking for more crime. Considering our history of overpolicing poor black neighborhoods, it’s really a pseudoscientific justification for continuing uneven policing practices.

Another example I like to give is Roger Ailes. Ailes was kicked out of Fox News after two decades of systemic discrimination against women anchors. If Fox News wanted to improve its reputation for fairness by implementing a machine-learning algorithm to hire anchorwomen and anchormen, then it would undoubtedly train on that data.

Because Roger Ailes was there for 20 years, it would be trained to think women are not successful. When it was given an application by a woman, it would say no, let’s filter her out, because she’s not going to succeed at Fox News. That’s an example of how you can codify past practices. Even if something changes, if you’ve automated past mistakes, they’re going to continue.

**SL** : People tend to romanticize machine learning and the term _algorithm._ There’s been a lot of boosterism around technology that doesn’t take these problems into account or even consider them in any serious way.

**CO** : People have an idealistic vision of data as somehow cleaned of bias. There’s a sense in which that’s true. Algorithms do not play favorites. They’re not going to favor their friend because they don’t have friends. They’re just going to look at the numbers.

Of course, society has already classified people into categories and treats people in different categories differently. A machine-learning algorithm will be expert at picking up those differences, even if they’re subtle. If there is subtle discrimination, machine-learning algorithms can be counted on to pick that up and automate it.

Another way of saying that is, if we had a perfect hiring process, or if we had a perfect and fair and clean culture, then we would want machine-learning algorithms to automate those, because that would save us time and money. Nothing is perfect right now. Nothing is perfect yet.

> If there is subtle discrimination, machine-learning algorithms can be counted on to pick that up and automate it.

**SL** : It almost sounds like a different approach would be to use algorithms to pick out what kinds of discrimination are being routinely practiced and then work on ways to combat that. Is that something that’s happening at all?

**CO** : That’s a really good point, and one I try to make. Instead of just blindly following algorithms as if they’re set in stone and perfect, we should be scrutinizing them for bias and using them as an opportunity to explore that discrepancy.

Why are we hiring one class of people instead of another? In the case of recidivism risk algorithms, why are certain people so much more likely to go back to prison? How can we intervene to make that discrepancy smaller? What kind of support can we put into place so that they’re less likely to end up back in prison? What exactly is it that is making their situation worse?

**SL** : What does all this mean for the future, and for the relationship between technology and democracy? Where do you see it leading?

**CO** : I’m optimistic, believe it or not. I think people are much more sensitized to these concepts now than they were, and I think we’re going to start to see people and algorithms being held accountable.

It’s been amazing to watch the response because people have really woken up in the last couple of months, especially since the election. I feel like half the time people say “Wow, I learned a lot,” and the other half of the time, people say “I was waiting for this book to be published.” That’s great. I want this conversation to be vibrant, ongoing, and I want it to have impact. I want shit to get moving. We’ve got no time to lose.

(2) _is the editor of_ The California Prose Directory 2016: New Writing From the Golden State.

1) (
2) (