OCTOBER 30, 2013
IN AUGUST OF 1961, the first volunteer installed himself in front of a “shock machine” that would go on to become the most famous prop in the history of social psychology. The volunteer had responded to an ad in The New Haven Register that offered $4.50 to participate in a study on memory and learning. Now a man in a gray lab coat was explaining to him that he was to administer electrical shocks to a second volunteer — the “learner,” who was strapped into a chair in another room — whenever the learner incorrectly identified a memorized word pair. The first volunteer (the actual subject of the study), who was likely a blue-collar worker from New Haven, may have felt nervous before the experiment started, possibly intimidated by both the prospect of shocking another person and the surroundings at Yale University. But surely the scientists here knew what they were doing; they wouldn’t put anyone in danger.
Before long, the subject, in addition to hundreds of others who eventually participated in the study, would find himself caught between the authoritative commands of the man in the lab coat and the cries of the learner howling out in pain on the other side of the wall.
“My heart’s starting to bother me now,” the learner would yell as the level of shocks increased. “Let me out of here, please!”
The subject would feel empathy. “He says his heart’s bothering him,” the subject would say. “He wants to stop.”
“The experiment requires that you continue,” the man in the lab coat, an actor named Williams, would respond. Williams would show no emotion.
The subject would hesitate.
“Though the shocks may be painful, they are not harmful,” Williams would say. “Please continue.”
And the subject would. In most cases, despite his better judgment, he would trust that this Yale scientist knew what he was doing and would continue to administer increasingly powerful shocks, even after the learner had stopped responding, presumably passed out, or worse, from the pain.
These subjects — 780 New Haven residents who volunteered — helped make an untenured assistant professor named Stanley Milgram a national celebrity. Over the next five decades, his obedience experiments provided the inspiration for films, fiction, plays, documentaries, pop music, prime-time dramas, and reality television. Today, the Milgram experiments are considered among the most famous and most controversial experiments of all time. They are also often used in expert testimony in cases where situational obedience leads to crime — as in 2004 when psychologist Philip Zimbardo referenced Milgram’s work in the trial of an Abu Ghraib prison guard.
Milgram’s studies — which suggest that nearly two-thirds of subjects will, under certain conditions, administer dangerously powerful electrical shocks to a stranger when commanded to do so by an authority figure — have become a staple of psychology departments around the world. They have even helped shape the rules that govern experiments on human subjects. Along with Zimbardo’s 1971 Stanford prison experiment, which showed that college students assigned the role of “prison guard” quickly started abusing college students assigned the role of “prisoner,” Milgram’s experiments are the starting point for any meaningful discussion of the “I was only following orders” defense, and for determining how the relationship between situational factors and obedience can lead seemingly good people to do horrible things.
There is still fierce debate as to what Milgram’s raw data actually tells us about human nature: did Milgram measure his participants’ obedience to authority, or merely their trust in scientists at a world-renowned university? Did he prove the evil that we’re capable of, or, as Alex Haslam suggested on Radiolab last year, the psychological strain we’ll endure in the name of science? Can any meaningful comparison be made between a laboratory at Yale and a death camp in Krakow? While these are just three examples of the countless lines of discourse spawned by Milgram’s experiments, the debate has never addressed this question: to what extent can we trust his raw data in the first place? In her riveting new book, Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments, Australian psychologist Gina Perry tackles this very topic, taking nothing for granted. Her chilling investigation of the experiments and their aftereffects suggests that Milgram manipulated results, misled the public, and flat out lied in order to deflect criticism and further the thesis for which he would become famous: that the Holocaust could have happened in New Haven.
As oft recounted in introductory psychology textbooks, the 1961 trial of Holocaust architect Adolf Eichmann inspired Milgram to measure the extent to which Americans would blindly obey authority. Hence: his famous studies with the phony shock machine. According to Milgram, the study found that, regardless of age, background, or gender, 65% of subjects administered what they believed to be the highest-level electrical shock to the learner, even as the learner first screamed in agony, then begged to be released, and finally went eerily silent.
But this narrative, Perry argues, is incomplete at best and deceitful at worst. She contends that serious factual inaccuracies cloud our understanding of Milgram’s work, inaccuracies which she believes arose “partly because of Milgram’s presentation of his findings — his downplaying of contradictions and inconsistencies — and partly because it was the heart-attack variation that was embraced by the popular media.”
Of course, the public tends to gravitate towards sensationalist material; Milgram can only be faulted so much for playing to our desire for spectacle. But Perry reveals that Milgram massaged the facts in order to deliver the outcome he sought. When Milgram presented his finding — namely, high levels of obedience — both in early papers and in his 1974 book, Obedience to Authority, he stated that if the subject refused the lab coat’s commands more than four times, the subject would be classified as disobedient. But Perry finds that this isn’t what really happened. The further Milgram got in his research, the more he pushed participants to obey. In early variations of the study, those “who resisted four times [were] classified as disobedient,” but in later iterations, especially the 20th one — notably the only variation to use female participants and thus crucial to Milgram’s claims to gender universality — “the same behavior was ignored.” In fact, Williams, the actor who played the lab coat, was only instructed to stick to the script in the first two variations, after which Milgram “tacitly allowed Williams license to improvise.” Williams forced the female participants to endure far more commands than the early male subjects, prodding one female subject 26 times before she finally gave in and was classified as obedient.
This new evidence suggests that Milgram’s female subjects may have been more likely to disobey than his male subjects. Perry also finds that in later variations, Milgram allowed Williams to ad-lib new commands. For example, at one point Williams learned from early trials that some participants had felt obligated to follow his directions in the interest of aiding Yale in its pursuit of knowledge. He then intimated to later subjects that, if they refused to follow his orders, the entire study would be invalidated. Milgram never mentioned these facts in any of his published writing.
Aside from the specific situational implications of these facts, Perry’s evidence raises larger questions regarding a study that is still firmly entrenched in American scientific and popular culture: if Milgram lied once about his compromised neutrality, to what extent can we trust anything he said? And how could a blatant breach in objectivity in one of the most analyzed experiments in history go undetected for so long?
When I first read Behind the Shock Machine, I was blindsided by the idea that the countless writers who have chronicled Milgram’s research over the years had never mentioned this mid-experiment gerrymandering (including his otherwise exhaustive biographer, Thomas Blass). Why, I wondered, had I trusted Milgram to tell the truth, even after noticing the self-serving tone of Obedience to Authority? (In addition to demeaning his subjects and glorifying his research in the pages of Obedience to Authority, Milgram apparently sent his publisher over 50 proposed taglines to help sell the book, including: “Buy this book. Obey this one command and you may be free of authority for evermore.”) If the Milgram of Obedience to Authority were the narrator in a novel, I wouldn’t have found him terribly reliable. So why had I believed such a narrator in a work of nonfiction?
The answer, I found, was disturbingly simple: I trust scientists. While I don’t necessarily trust them to explain what their data means, since all interpretation must be weighed against ego and agenda, I do trust them not to lie about the rules or results of their experiments. And if a scientist does lie, especially in such a famous experiment, I trust that another scientist will quickly uncover the deception.
Or at least I used to.
Shortly after Milgram first published his results, his colleagues did voice concerns about his ethics. He may have permanently traumatized subjects who had no idea what they were getting into, his detractors argued. He hadn’t screened his volunteers to weed out those especially at risk for psychological damage (one woman, who had previously undergone electroshock treatments, volunteered for this “memory study” to learn about the effect of electric shocks on memory). At the very least, he had tricked hundreds of innocent people into a situation that was intensely stressful and often humiliating.
Milgram contended that he was as surprised as anyone that his participants went as far as they did, subjecting themselves to such dramatic stressors when they could’ve halted the proceedings at any time. Perry, however, finds in Milgram’s notes from the planning stages “the unfolding of a slow process of trial and error as he refined, tightened, and scripted a scenario that would deliver the results he wanted.” At the time, Milgram was 27, fresh out of grad school and needing to make a name for himself in a hyper-competitive department, and Perry suggests that his “career depended on [the subjects’] obedience; all his preparations were aimed at making them obey.”
But it was not only his preparations or his mid-experiment procedural changes that were aimed at demonstrating obedience — it was also his choice of which results to present. Buried in the Milgram archives, Perry discovers an unpublished 24th variation in which Milgram recruited 20 pairs of “friends, relatives, neighbors, fathers and sons” to perform the experiment on each other. Milgram may have concealed this version once criticism began to mount following the publication of his earlier trials, since the ethics of the 24th iteration would be very difficult to defend: if you think learning that you would torture a stranger is traumatizing, imagine learning that you would torture your best friend.
In this final, and, until now, unknown, version of the experiment, after each pair of friends, neighbors, and relatives arrived at the lab, Williams would give one of the subjects instructions on how to administer the shocks, while the other subject — who in this variation was to play the part of the learner — was taken into a separate room and asked to scream out in pain on cue so that his partner would think the shocks were real. The ease with which all the learners consented to dupe their friends, neighbors, and relatives suggests that there’s something highly enticing about standing on the right side of the one-way mirror — a hypnotic voyeurism that can make us forget our better selves. Yet out of the 20 pairs tested, only three men gave in and administered the highest-level shock.
The results were problematic for Milgram. It was fine to show, as he had in several of the 23 published trials, that situational variables like proximity to the learner would affect obedience, but he didn’t want innate factors like gender or relationship to the victim to limit the scope of his theory. He believed that his findings needed to be universal in order to be truly groundbreaking. That’s why he allowed more prodding in the variation involving female subjects than in the all-male variations, and that’s why he never published the 24th condition: The 15% obedience rate fell far short of his 65% standard. As Milgram wrote in his private journal after conducting the 24th variation, “Within the context of this experiment, this is as powerful a demonstration of disobedience that can be found.”
History often pairs Milgram’s obedience studies with Philip Zimbardo’s Stanford prison experiment, not only because they are the two most famous American social psychology experiments of the 20th century, but because Milgram and Zimbardo faced two of the most challenging ethical dilemmas.
After Zimbardo first presented the results from the Stanford prison experiment at the APA Convention in 1971, he recalls that Milgram greeted him joyfully, “saying that now I [Zimbardo] would take some of the ethics heat off his shoulders by doing an even more unethical study!” This was because the volunteers playing the guards in Zimbardo’s experiment had begun psychologically torturing the “prisoners” already on the second day of the planned two-week study. By day six, the prisoners were becoming intensely depressed and refusing to eat. If Zimbardo had continued as scheduled, he may have been able to dig even deeper into the psyche of the subjects playing the guards, who were on the verge of descending from simple cruelty to the prisoners into pure evil. But at the behest of his girlfriend, who had enough distance to see the disastrous effects of the experiment on the prisoners, Zimbardo chose to discontinue the study after only six days.
Conversely, when Milgram faced his own ethical dilemma — the fact that signs of trauma were apparent in his subjects very early on in the 11-month testing period — instead of discontinuing the experiment or instituting precautions to screen out especially at-risk volunteers, he upped the stakes by testing his theory on pairs of family members. And while Milgram defended the ordeal he subjected participants to by stating that they were all debriefed immediately following the conclusion of the trial, one of his former lab assistants tells Perry: “For most people who took part, the immediate debrief did not tell them there were no shocks.”
According to Perry, only after criticism of his ethics surfaced, and long after the completion of the studies, did Milgram claim that “a careful post-experimental treatment was administered to all subjects,” in which “at the very least all subjects were told that the victim had not received dangerous electric shocks.” This was, quite simply, a lie. Milgram didn’t want word to spread through New Haven that he was duping his subjects, which could taint the results of his future trials. A majority participants were therefore told that the shocks they had administered had not been as strong as originally indicated. They were introduced to the learner, who assured them he was fine. But many subjects were rushed out the door without a debriefing of any kind to accommodate a tight schedule. One participant wrote to Milgram, “I actually checked the death notices in the New Haven Register for at least two weeks after the experiment to see if I had been involved and a contributing factor in the death of the so-called learner.” While most volunteers received a letter months later explaining the true nature of the study, one subject tells Perry that he only learned the truth about the experiment over 30 years after participating, when he read an article about the purchase of the Milgram archives in his local paper.
Behind the Shock Machine does not negate the cultural relevance of the obedience experiments, which will remain stunning no matter how much Milgram manipulated our understanding of the situational factors that led to compliance. But Perry’s research should change the way we view Milgram’s work. While Milgram’s defenders point to subsequent recreations of his experiments that have replicated his findings, the unethical nature, not to mention the scope and cost, of the original version have not allowed for full duplications. A heavily toned-down version of the obedience experiments, conducted by Jerry Burger at Santa Clara in 2006, mostly confirmed Milgram’s results, finding what Burger described as only a slightly lower obedience rate than the original. But due to the APA’s Code of Ethics, instituted in 1973, Burger’s variation had to stop the phony shocks at 150 volts instead of going to 450, which had been Milgram’s benchmark for obedience. This means that Burger’s subjects had 20 fewer chances to resist the lab coat’s commands. Since half of the participants who disobeyed in Milgram’s experiment did so after 150 volts, the 2006 version could’ve found up to double its reported level of disobedience had it gone to 450.
This is, of course, conjecture. As modern guidelines appropriately prevent us from learning to what extent a neutral psychologist could reproduce Milgram’s results, we have to take Milgram’s word for it. But, after Behind the Shock Machine, doing so would require a misplaced trust in authority.