How to poison the data that Big Tech uses to surveil you

Without good data, algorithms don’t do anything. People can use this to push for change. Tech companies can track you by following the digital breadcrumbs you leave behind every day. You can send an email, order food, or watch a show by streaming it. They get back useful packets of information that help them figure out what you like. This information is fed into algorithms that learn to target you with ads and suggestions. Google uses the information you give it to make more than $120 billion a year from ads.

We are becoming less and less able to get out of this deal. In 2019, Kashmir Hill, who was a reporter for Gizmodo at the time, tried to get rid of five big tech companies. She was miserable for six weeks because she couldn’t do simple things on her computer. On the other hand, the tech giants didn’t even have an itch.

Now, researchers at Northwestern University are coming up with new ideas for how to fix this imbalance of power by using our data as a bargaining chip. Even though tech giants may have fancy algorithms, they are useless without enough of the right data to train them.

In a new paper that will be presented next week at the Fairness, Accountability, and Transparency conference of the Association for Computing Machinery, researchers like Ph.D. students Nicholas Vincent and Hanlin Li suggest three ways the public can use this to their advantage:

  • Data strikes are inspired by the idea of labor strikes, which involve withholding or deleting your data so a tech firm cannot use it—leaving a platform or installing privacy tools, for instance.
  • Data poisoning involves contributing meaningless or harmful data. AdNauseam, for example, is a browser extension that clicks on every single ad served to you, thus confusing Google’s ad-targeting algorithms.
  • Conscious data contribution, which involves giving meaningful data to the competitor of a platform you want to protest, such as by uploading your Facebook photos to Tumblr instead.

Many of these ways to protect privacy are already used by people. If you’ve ever used an ad blocker or another browser add-on that changes your search results to leave out certain websites, you’ve done “data striking” and taken back some control over how your data is used. But Hill found that random actions like these don’t do much to change the way big tech companies act.

What if, though, millions of people worked together to poison the data well of a tech giant? That might give them a little more power to get what they want.

This may have happened a few times already. After Facebook said it would start sharing WhatsApp data with the rest of the company, millions of users deleted their accounts and switched to rivals like Signal and Telegram in January. Because so many people left, Facebook had to put off changing its rules.

This week, Google also said that it would stop following people around the web and putting ads in front of them. Vincent says that it’s not clear if this is a real change or just a rebranding, but it’s possible that more people using tools like AdNauseam, which makes algorithms less effective, led to this decision. Of course, it’s hard to say for sure. “Only the technology company truly understands how well a data leverage change affected a system,” he says.

Vincent and I think that these campaigns can go along with other ways to fight Big Tech, like lobbying for policy changes and organizing workers.

Ali Alkhatib, a research fellow at the University of San Francisco’s Center for Applied Data Ethics who was not involved in the research, says, “It’s exciting to see this kind of work.” “It was really interesting to see them think about the collective or holistic view: we can mess with the well and make demands with that threat because it is our data and it all goes into this well together.”

There is still work to be done to make these campaigns more well-known. Computer scientists could be very helpful in making more tools like AdNauseam, which would make it easier for people to take part in these kinds of tactics. Politicians could also do their part. Data strikes work best when they are backed up by strong data privacy laws, like the General Data Protection Regulation (GDPR) of the European Union, which gives consumers the right to ask for their data to be deleted. Without these rules, you can’t be sure that a tech company will let you delete your digital records, even if you cancel your account.

There are still some questions to be answered. How many people do you need to hurt a company’s algorithm with a data strike? And what kind of information would do the most damage to a certain system? In a simulation of an algorithm for recommending movies, for example, researchers found that if 30% of users went on strike, it could make the system 50% less accurate. But each machine-learning system is different, and companies are always making improvements to them. The researchers hope that more people in the machine-learning community will be able to run simulations of the systems of different companies and find their weaknesses.

Alkhatib says that scholars should also do more research on how to get people to act together based on data. “It’s really hard to work together,” he says. “One of the hardest things to do is get people to finish what they started.” And then there’s the question of how to make a group of people who move around a lot—for example, people who use a search engine for only five seconds—feel like they’re part of a community that will last.

He also says that these strategies could have long-term effects that need to be carefully examined. Could data poisoning end up giving content moderators and other people who are in charge of cleaning and labeling companies’ training data more to do?

But Vincent, Li, and Alkhatib are hopeful that data leverage could be a powerful way to change how tech giants handle our data and privacy. AI systems are dependent on data. “That’s just how they work,” says Vincent. “In the end, that’s how the public can get more power.”


