Mathematician Cathy O'Neil holds a Ph.D. in number theory from Harvard, and she taught at Barnard for years before leaving to work in finance. She took a job at the hedge fund D.E. Shaw in June 2007 but left the field soon after, embarrassed by the role bankers had played in the housing collapse. In 2011, she rebranded herself as a data scientist and went looking for a place where she could do math and feel good about it.
Before long, O'Neil landed a spot at an e-commerce startup, developing models to anticipate the behavior of visitors to travel websites. There, she was struck by the parallels she found between data science and the financial system that had just decimated the U.S. economy. "A false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops," she writes in her new book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.
Weapons of Math Destruction was nominated for a 2016 National Book Award. Lately, I've been recommending it to everyone I know, as both a key to understanding the present and a tool to ward off an increasingly dystopian-seeming future.
Sarah LaBrie : What is big data?
Cathy O'Neil : Big data means different things to different people. For the sake of the book, I meant it as the marketing campaign around the newfangled uses for data. It's a promise, and it's a hope. The promise is that once you have big-data algorithms, they are inherently fair and objective, and they are better than human beings at being fair. That's something that isn't always explicitly stated.
Then there's the hope, which is also often not stated, that we don't need the actual data around the thing we're trying to predict, that it's good enough to use proxies for that thing.
SL : What is a weapon of math destruction?
CO : A weapon of math destruction is an algorithm that is important but secret at the same time, and also destroys people's lives. It's being used for important decisions, and it's being used unfairly so that people's lives get ruined, or at least they get injured.
An example that really got my attention early on was the teacher value-added model. You had teachers whose jobs were on the line through secret teacher assessments that no one could explain to them and that were statistically flawed, so the scores were inconsistent. I found a teacher who scored a six out of 100 one year, and a 96 out of 100 the next year, without changing his methodology of teaching. When teachers tried to appeal their scores, they were told the scores were too mathematically complicated for them to understand. They were told, "You're not expert enough to question this."
SL : You write about how poisonous assumptions can be camouflaged by big data algorithms. Can you talk more about that?
CO : Data is a digital echo of culture. If we consistently arrest blacks for smoking pot at four times the rate we arrest whites for smoking pot, even though whites and blacks smoke pot at the same rate, the data reflects that bias. We have many more arrest records for blacks under the crime of smoking pot than for whites. What we don't acknowledge is that we haven't successfully measured crime.
When we use these arrest records as if they're a good proxy for the actual thing we're trying to measure, then we end up developing tools like predictive policing, which looks for locations of arrests and sends police back to those same places looking for more crime. Considering our history of overpolicing poor black neighborhoods, it's really a pseudoscientific justification for continuing uneven policing practices.