OCTOBER 20, 2014
CHRISTIAN RUDDER is a good writer — so good, in fact, it’s easy for readers of Dataclysm: Who We Are (When We Think No One’s Looking) to be smitten by his engaging and funny stories. These stories allegedly humanize big data. But if we follow the script and only fixate on the tidbits he’d like us to giggle at, ponder, and share, we’ll miss something important: questionable assumptions, which lurk beneath the surface layer of titillating description.
Let’s start with the basics. Harvard-educated Rudder is co-founder and president of the popular online dating website OkCupid and leads their analytics team. He’s responsible for making sense of the vast data users create — data that this year alone will come from 10 million people. Rudder thus sits on a massive stockpile of proprietary information that reveals intimate details about what people of different races and genders say about themselves, as well as how they interact — or fail to interact — with each other. In Rudder’s eyes, the latter is a sociological gold mine. It reveals what people do when they’re thinking about connecting and hooking up, unaware that, thanks to the help of computers that never sleep, Rudder is scrutinizing their every move, itching to translate their decisions and proclivities into social facts.
Rudder portrays the volume of information — the “unprecedented deluge,” or, “dataclysm” — as a gift that can reveal the truth of who we really are. When people describe what they do, they’re prone to idealization. When they describe what they’re inclined to do, they look at the future with rose-colored glasses. But when people don’t realize that they’re lab rats in Rudder’s social experiments, they reveal habits — “universals,” he even alleges — that can contradict noble self-conceptions. “Practically as an accident,” Rudder declares, “digital data can now show us how we fight, how we love, how we age, who we are, and how we’re changing.”
Rudder makes two questionable assumptions by framing his inquiry in this way. First, he takes it for granted that, in general, people are genuine when they think nobody is watching, which means, in this particular case, that OkCupid’s data is a transparent window into an authentic reality. Such a methodological supposition, however, requires careful consideration. We need solid evidence, or some sort of compelling theory, to rule out the possibility that confounding factors are in play.
Renowned sociologist Erving Goffman convincingly demonstrated that social interactions can be performative. In other words, we often modify what we do to account for other people’s expectations and judgments. But Rudder just assumes that all of the reasons we have for impression management fade away on OkCupid. An equally plausible alternative possibility is that when OkCupid users interact with the platform and express themselves through the constraints it imposes, they don masks appropriate for the occasion.
Second, what Rudder calls an accident — say, OkCupid data revealing that race can play a big role in people’s choices of whom to date, despite their proclaiming otherwise — I see as an analyst’s deliberate decision to abstract data away from its intended context, and put it to what privacy scholars call “secondary use.” I’ll revisit this point momentarily and argue it has noteworthy implications for how we talk about, understand, and protect privacy.
When Rudder shines a spotlight on human behavior, he focuses on what people do in the aggregate, not on tales of interesting individuals. By treating anonymized groups as the primary subject matter, Rudder can do two things: focus on numbers and identify trends occurring at significant scale. Rudder maintains that this approach sets him apart from famous writers like Malcolm Gladwell.
I see this book as the opposite of outliers. Instead of the strays from the far reaches of the data — the one-offs, the exceptions, the singletons, the Einsteins for whom you need the whole story to get it right, I’m pulling from the undifferentiated whole. […] [It’s] science as pointillism. Those dots may be one fractional part of you, but the whole is us.
As far back as 2010, Rudder made it clear that he’s interested in cataloging our collective, hidden life. That’s when he started his blog OkTrends. Indeed, Dataclysm reads like an extended, visually intense remix of those posts, bolstered by insights gleaned from a host of external sources, including Twitter, Google, Reddit, and Craigslist, and an infusion of new OkCupid data that replaces “previous findings.” The one saving grace, here, is that, given what Rudder wants to say, his vignettes are logically arranged and don’t come across as collated posts that were compiled in chronological order. The three sections “What Brings Us Together,” “What Pulls Us Apart,” and “What Makes Us Who We Are” do designate discrete differentiable issues.
Okay. So, what do we learn from Rudder’s data-says-the-darndest-things story? Well, we’re informed that as men age they find their contemporaries less appealing than younger women. Summarizing his findings, Rudder declares, “A woman’s at her best when she’s in her very early twenties. Period.” In comparative terms, “Women want men to age with them. And men always head toward youth.”
We’re also told that “faith, politics, and […] looks […] don’t matter nearly as much as everyone thinks” when trying to find a compatible partner. But “humble questions like Do you like scary movies? and Have you ever traveled alone to another country? have amazing predictive power.” And, we’re informed that while black men aren’t prone to speaking about “Borges,” Latino men aren’t inclined to talk about “Freakanomics.” It also turns out that the “least black band on Earth is Belle & Sebastian.” As to Twitter users, it turns out “if you have a lot of followers, you are […] more likely to speak like a corporation.”
I could go on. Rudder certainly would want me to highlight memorable bits like the fact that, conscious of a cultural stereotype, both Asian men and women are inclined to put “tall for an Asian” in their profiles. By now you get the point. The preponderance of Rudder’s revelations fit the bill of what talk show host Arsenio Hall used to call “things that make you go hmmm.” It’s hard to see them as epiphanies.In some cases, Rudder makes off-putting remarks. In one especially egregious section, he acts as if he’s merely speaking as a realistic life hacker who can muster data to support the platitudes of efficiency gurus everywhere. There’s much more at stake, however, in his praise of OkCupid users sending romantic prospects boilerplate messages.
Sitewide, the copy-and-paste strategy underperforms from-scratch-messaging by about 25 percent, but in terms of effort-in to results-out it always wins: measuring by replies received per unit effort, it’s many times more efficient to just send everyone roughly the same thing than to compose a new message each time. I’ve told people about guys copying and pasting, and the response is usually some version of “That’s so lame.” When I tell them that boilerplate is 75 percent as effective as something original, they’re skeptical — surely almost everyone sees through the formula. […] [L]et me tell you something. Nearly every single thing on my desk, on my person, probably in my entire home, was made in a factory alongside who knows how many copies. I just fought a crowd to pick up my lunch, which was a sandwich chosen from a wall of sandwiches. Templates work. […] Innovation is using a few keyboard shortcuts to save […] some time.
This passage is disturbing in several respects. First, Rudder treats the process of communication in purely instrumental terms: it’s a numbers game and to win you’ve got to maximize your response-to-effort ratio. Now, it could be argued that during the early stages of dating, minimal effort is appropriate. After all, people are busy, and they can take a more conscientious and personalized approach to socialization after things go to the next level and it becomes clear what a particular individual is worth. But Rudder doesn’t convey a sense that as relationships deepen so do our responsibilities. Instead, he posits an unnerving equivalence between people and commodities.
That’s the second problem: Rudder’s comparison of people to factory goods. Sure, most of us take advantage of mass production and treat artisanal wares as … well … treats. But viewing people, or even delicious sandwiches, as widgets is dehumanizing to anyone, not just Marxists! The situation is all the more fraught because the comments come from a man who is talking about dating data; in a world where sexism remains potent, women disproportionately experience the burden of being objectified. While Rudder makes it a point throughout Dataclysm to discuss gender when doing so reveals amusing and provocative tidbits, in this section he merely studies disembodied processes, such as the relations between message length and response rate, the time it takes to compose a message and response rate, and the number of characters typed relative to the number appearing in the final message. Such a reductive framing allows Rudder to analyze “effort” in the abstract while ignoring questions concerning who is willing to expend the effort and why some people might receive less attention than others.
The third problem is that Rudder conflates normative and psychological responses. When people tell him they expect boilerplate communication to be perceived as a self-sabotaging shortcut, they’re basically saying we can see through a poorly executed charade and recognize obvious acts of laziness or deception. To refute this, Rudder can try to prove that we’re not as good at spotting generic or typecast formulations as we think we are. But when folks tell him the practice is “lame,” they’re judging the activity itself and the people who engage in it. If there’s good reason to find the gesture offensive, those reasons remain valid whether or not the recipients realize what they see is not what they expect to get.
The fourth problem is that Rudder associates innovation with efficiency. This is Silicon Valley dogma: friction is bad because it slows people down and generates opportunity costs that prevent us from doing the things we really care about; minimizing friction is good because it closes the gap between intending to do something and actually doing it. Such a cavalier attitude toward efficiency-enhancing technology creates the impression that at any moment we can slow down and behave more thoughtfully and deliberately. But why assume this is the case when technology companies are providing us with ever-increasing opportunities to do things hyper-efficiently and creating an infrastructure that’s conducive to cut-and-paste culture?
To give but a few examples: newspapers are using formats like sharelines so that readers don’t have to bother formulating their own opinions of articles but can pass along bland summaries; apps allow us to compose text messages that pass off drop-down notes as our own formulations, and even enable us to send them with a set time-delay; apps predict how we’re inclined to speak and can present us with suggested content to share that appears as our own considered prose; and, Google applied for a patent to automate our social media voice, in case we’re too busy to actually congratulate our friends on their accomplishments.
My biggest issue with Dataclysm, however, lies with Rudder’s treatment of surveillance. Early on in the book he writes: “If Big Data’s two running stories have been surveillance and money, for the last three years I’ve been working on a third: the human story.” This claim about pursuing a third path isn’t true. Dataclysm itself is a work of social surveillance.
It’s tempting to think that different types of surveillance can be distinguished from one another in neat and clear ways. If this were the case, we could say that government surveillance only occurs when organizations like the National Security Agency do their job; corporate surveillance is only conducted by companies like Facebook who want to know what we’re doing so that they effectively monetize our data and devise strategies to make us more deeply engaged with their platform; and social surveillance only takes place in peer-to-peer situations, like parents monitoring their children’s phones, romantic partners scrutinizing each other’s social media feeds, and, to use Rudder’s main example from Chapter 9, folks everywhere scouring the internet to find out when Justine Sacco’s plane landed in Johannesburg after she made her infamous tweet about not getting AIDS because she’s white.
But in reality, surveillance is defined by fluid categories. Consider, for example, what recently happened in the aftermath of a “brutal attack on a gay couple in downtown Philadelphia.” Police posted a surveillance video online of people who were involved. Not too long after they shared their footage, someone put a picture online of what appeared to be the same group hanging out at a restaurant. Twitter users figured out which restaurant this was, and @FanSince09 used Facebook’s Graph Search to get a list of the names of people who had checked in there — some of whom he determined were in the photo. The police immediately acknowledged the civilian’s gesture was helpful. As a definitional matter, this is important. A civilian used social media to obtain information about other civilians in order to assist law enforcement; given this intention, the activity shouldn’t be classified merely as social surveillance.
Now, let’s think critically about what Rudder is doing when he discusses corporate data in Dataclysm. Does it make sense to say he’s merely engaging in corporate surveillance? I don’t think so. As with the previous example, intention matters.
When it comes to the reasons for writing and publishing the book, Rudder doesn’t discuss OkCupid data trails in order to better optimize the platform and extract more value for users. Instead, he presents the general public with proprietary information in order to enhance our collective understanding of how members of society really look at and treat one another. Like any 21st-century alchemist who tries to turn big data about peer-to-peer interactions into social facts, Rudder performs social surveillance.
Rudder doesn’t think there’s anything wrong with what he’s doing, since he restricts his stories to coverage of anonymized, aggregate data. He’s not spilling the beans on what any particular person does. Indeed, he doesn’t publicize any new personal information at all. In Rudder’s own words:
But despite some, even many, people’s cavalier attitude toward privacy, I didn’t want to put anyone’s identity at risk in making this book. As I’ve said, all the analysis was done anonymously and in aggregate, and I handled the raw source material with care. There was no personally identifiable information (PII) in any of my data. In the discussion of users’ words — their profile text, tweets, status updates, and the like — those words were public. Where I had user-by-user records, the userids were encrypted. And in any analysis the scope of the data was limited to only the essential variables, so nothing could be tied back to any individual.
This statement contains two fallacies. Rudder presumes that if information is shared publicly in one context, there’s no privacy interest in it being shared publicly in any other. But as law professor Woodrow Hartzog and I argue, information exists on a continuum: on one end lies information we want to disseminate to the world; on the other end is information we want to keep absolutely secret. In between is a vast middle ground of disclosures that we only want to share with selective people, our “private publics.” As a matter of both etiquette and ethics, it’s important to consider whether someone would have good reason to be upset with us moving information from one place to another, where it can be more easily seen by new audiences. Privacy scholar Helen Nissenbaum defines this as an issue of “contextual integrity.”
It’s also a mistake to assume people only have privacy interests in matters concerning direct revelations of their deeds. Contrary to what Rudder presumes, OkCupid users do have a legitimate privacy interest in Rudder not publicly talking about what they collectively do, especially since the context of his analyses falls far outside the scope of what many envision when signing up for the service. When most people log on to OkCupid, they expect their information to be analyzed for purposes related to finding good matches. They certainly aren’t thinking about the possibility that a data scientist will convert their behavior into sociological facts about basic human dispositions and relations. Rudder’s violation of the initial contextual integrity thus puts personal data to questionable secondary, social use.
The use is questionable because privacy isn’t only about protecting personal information. People also have privacy interests in being able to communicate with others without feeling anxious about being excessively monitored. Such anxiety can arise in anonymous situations. It can feel overwhelming to speak on behalf of general groups — women, men, blacks, whites, Latinos, Asians, etc. — and risk contributing to the formation of new cultural stereotypes. That risk can be seen as such an immense responsibility and burden that the resulting apprehension inhibits speech, stunts personal growth, and possibly even disinclines people from experimenting with politically relevant ideas.
Of course, Rudder could respond that if users don’t like what he’s doing, there’s an easy solution: Don’t use OkCupid! That’s basically how he replies to critics of Facebook and Google, suggesting that if people believe these services offer insufficient compensation and poor tradeoffs, they simply shouldn’t use them. But this is a myopic way to look at the situation. Sure, nobody is forced to use any particular online dating service or even meet people through online interactions. Likewise, nobody is forced to find information on Google or socialize on Facebook. But as the social and professional costs of opting out of many of these services increases, and as more and more proprietary platforms watch, catalog, interpret, and share our every move, it becomes untenable to construe our options as a matter of individuals needing to make smart choices that weigh costs and benefits. The deck is stacked, and many of the available choices exhibit anti-privacy bias. Sadly, the situation becomes even more fraught once we also factor in related issues, like manipulation. University of Maryland professor James Grimmelmann’s legal and ethical analysis of some of OkCupid’s experiments is especially enlightening.
My review of Dataclysm might seem overly harsh. But what can we really expect when Rudder primarily writes about people who use his platform? Although Rudder wants to shock us with reports of people behaving in unflattering ways, there’s a limit to how far he can go. Unlike an academic sociologist who can avoid conflicts of interest, financial entanglement is at stake here. OkCupid might lose money if readers of Dataclysm get so offended they quit the service. And if Rudder were to be more reflective and question whether playing data scientist really does give him the right to tell the story of OkCupid users, the very rationale for his project might be undermined. These constraints practically ensure that when Rudder says “tech evangelism is one of my least favorite things,” he’s confessing to his own cognitive dissonance, which prevents him from coming to terms with his role as a big data booster writing a book that’s ultimately an extended love letter to information trails people don’t realize they’re blazing.