MAY 12, 2021
STATISTICAL MEASURES and models are ubiquitous these days. In principle, you can decide how to behave — whether, say, to drink two glasses of wine with dinner, or in pandemic times, whether to go out to dinner at all — based on numbers that reflect your health risks. A year into the pandemic, you could look up your local exposure risk, the likelihood of infection from gathering in a particular size group, the chance of hospitalization if infected, the relative benefit of getting vaccinated.
None of those numbers is exactly accurate, however. Far from constituting your personal health risks, they are abstractions: aggregated probabilities for groups of people somewhat like you in terms of age, gender, race, or neighbors.
Almost invariably, data are aggregated in ways that unintentionally or intentionally reflect the factors that data collectors assume are important, making them less relevant for, say, pregnant women, people of color, or those in locales with fewer health-care providers. Depending on how fine-grained your inquiry is (I want to meet up with six people in Chicago, all of them healthy women in their 40s, with recent negative tests, and…), it can become challenging to understand why these aggregate measures might be useful at all. Not surprisingly, people have used the very fact of aggregation to argue public health advice may apply in general, but not to them (most people in my ZIP code live in apartments but I don’t; most people my age are frail but I’m not). We may label such arguments ignorant or dangerous. Yet, on some level, all our most pressing questions are fine-grained. Unless we work in public health, we don’t usually want to know about disease in general. I don’t want to know the health risk for middle-class city-living professional white men in their late 30s. I want to know about my risk.
This tension between knowing about my risk versus risk in general is also present in fields like economics, but for modern medicine in particular it presents a fundamental paradox. That paradox can be captured by the fact that we know quite a lot about the treatment of disease in general and relatively little about how to treat individuals. Going for a check-up in the 21st century is, more often than not, a numerical affair where the numbers represent biomarkers that help predict your future risk of disease or recovery. They may be your markers, but what we know about them is entirely based on aggregating data from people similar to you in some respects but not others.
Governments, of course, do have reasons to care about the aggregate. Many of the figures that have been used to track the pandemic (case fatality rates, levels of available hospital resources, estimated numbers of susceptibles, etc.) are actually based in quite old measurement tools, used in statistical practices of public health for well over a century. For most of human history, the only diseases that consistently mattered were infectious. A great deal of medical attention was paid to plague, scarlet fever, smallpox, and similar maladies even though effective treatments were rare. Public health representatives could still measure and learn from cases in the aggregate: no one knows which patients will die of smallpox, but it is possible to estimate the trajectory of an outbreak within a community or determine the effects of inoculation. That’s useful knowledge.
The fields of probability and statistics had roots in 17th-century card and dice games, but until the turn of the 20th century, statistically minded people were more typically focused on settling debates about the success of inoculation, life insurance premiums, life tables and mortality estimates, an epidemic’s rise and fall, trait distributions, measures of the inheritability of a particular ailment, and the spread of mental illness within a population. There were of course other uses of statistics, but the fact is that many of the more important and visible statistical questions concerned public health.
More broadly, the notion of aggregation is central to almost any definition of statistics, and one of the field’s most distinguished practitioner-historians, Stephen Stigler, includes it as the first of his pillars of statistical wisdom. It is certainly the oldest pillar, and Stigler suggests it is the most radical in that aggregation promises knowledge through the act of systematically throwing information away. In other words, by finding the arithmetic mean of a group of figures you erase information about each individual measurement while gaining knowledge that no single observation could provide. In the case of medicine, however, what’s often being thrown away are the details about individuals — what makes me different from you — precisely, that is, what we want our physician to focus on!
Though there are a handful of examples in which statistical tools were applied to clinical judgments before 1900, the idiosyncrasies of the individual patient were only deemed important enough to warrant systematic statistical study in the last century. Perhaps the most important reason for clinical application was the belief that diseases were multifactorial in both their etiology and treatment. The late-19th-century dream of determinism in medicine — that laboratory research would be able to identify single causes of disease, the elimination of which would ensure health — did not square with the rise of non-infectious and chronic conditions in the 20th century. Taking their cue from epidemiologists, but even more so from insurance modelers and demographers, health-focused statisticians (or biostatisticians) in the mid-20th century started to downplay the distinction between treating an individual with a disease and treating a group of diverse individuals with the same disease. Instead, they realized the power of statistics could be leveraged by replacing a singular individual with a group of people who shared those characteristics. You might see yourself as a kind and loving, mid-career salesman from the Midwest who occasionally smokes and is a bit overweight. But in this modern denuded version of yourself, you’re a collection of data points — gender, job, age, weight, environment, behavior — and for each of these, and ideally for them collectively, you could be matched with a population whose aggregate risks have been modeled.
As infectious diseases continued to wane in importance across much of Europe and North America, the primary role of statistics in medicine shifted from public health to the clinic. Perhaps the most important example of this shift was the development of the Framingham Heart Study. In the late 1940s, recognizing that heart disease, strokes, and angina pectoris seemed to have no single cause, researchers at the National Institutes of Health wanted to study a single population over time to track the relationship between lifestyle, health choices, demographics, and biomarkers, and the risk of heart attacks and disease. Recruiting a group of a few thousand residents from Framingham, Massachusetts, physicians planned to give them free exams every two years to track their ailments and correlate their behavior and biomarkers with outcomes, thereby uncovering what we now call “risk factors.”
Within just a few years, researchers knew that both systolic blood pressure and serum cholesterol levels were strongly correlated with the risk of developing coronary heart disease. For individual patients that knowledge turned out not to be all that useful, the National Institutes of Health’s statisticians noted. As originally designed, the study didn’t tell them how to combine risks from blood pressure and serum cholesterol, nor which one to focus on lowering if they had high levels of both. Nor did the study indicate whether small differences in cholesterol level would meaningfully change the risk of an adverse cardiac event.
What was needed was a way to combine data so that judgments in general about public health risks could be turned into probabilistic statements about someone in particular. Statisticians wanted to move from tracking relative risk (e.g., high levels of cholesterol double the risk of coronary heart disease compared to low levels) to uncovering the probability of adverse heart events given a range of variables (e.g., an overweight smoker with high blood pressure has a 20 percent chance of a heart attack in the next five years, but half that if he lowers his blood pressure substantially). They wanted to make the aggregated epidemiological data useful for individual clinical judgments.
There was little new here mathematically, but this reasoning revolutionized the concept of risk in clinical medicine. Shifting to the aggregate enabled the substantial machinery of statistical inference to be deployed. If medicine and health are inescapably uncertain, better to aggregate and have reliable estimates of that uncertainty than to try to predict what will happen in any given case.
In turn, an individual could link herself with a population that shared her characteristics. The trick was to pretend that the average risk for the group was a personal risk, even though it was only an estimate of risk, inevitably fraught with fuzziness and uncertainty. Put differently, there are a bunch of people who share any set of variables, and while the average risk may be the same for all of them, there will almost certainly be variability, which may indicate there were relevant differences between people that weren’t measured (or just that the risk is inescapably variable). Perhaps family history or diet wasn’t recorded, but turned out to be important, for example.
The vast majority of people, and many clinicians, ignored the distinctions between aggregated and individual risk, however. An individual would simply take some measurements, enter them into a Framingham Risk Score calculator, and compute what looked like a personal risk of heart disease when it was really the average risk of the population of people who shared those measurements. But without knowing the shape of that distribution (that is, how likely you were to be close to the average), you often knew very little about your individual risk. Some young triatheletes may have a great score and still suffer a heart attack, for example, perhaps because — as in the case of 41-year old Kristie Elfering — the score doesn’t account for heart disease that runs in the family. There’s always something not measured that might make your individual case more or less like the average.
Similar examples of the increasing importance of aggregated data can be traced in other areas. Clinical trials, for example, typically compare the experience of two groups of people randomly assigned to receive one treatment or another. The average treatment effect can be measured by comparing outcomes, but this may or may not be a good estimate of the specific effect the therapy would have on any given individual. Careful observational studies, both prospective and retrospective, for instance, enabled the link between smoking and lung cancer to be made far more convincing and precise than would otherwise have been possible, but those studies still won’t indicate if a particular smoker will get cancer. New research methods draw on huge sets of anonymized electronic medical records to see what sorts of correlations exist. Even though they are incredibly rich data sets that have lots of individual patient information, researchers can still only make general statements about the aggregate.
The role of aggregated, statistically validated evidence in medicine soon came to be a part of what reformers in the 1980s and 1990s were promoting as “Evidence Based Medicine.” The implication was that reliable knowledge could not emerge from singular cases or idiosyncratic treatment plans but from collecting cases in a systematic fashion, often through meta-analyses — studies of studies — or similar methodologies. More recently, some critics have noted that Evidence Based Medicine emphasizes the so-called best treatment for a specific condition at the cost of relevant differences between individuals. With the sequencing of the human genome, these critics have instead proposed methods that rely on genetic information or individual biomarkers to “personalize” medicine again. Under the guise of Precision Medicine, there have been some important developments in gene-targeted therapies, especially for cancers.
But while these pharmacogenetic breakthroughs are life-changing for some, they still are not really about individual patients. They are about a group of people who share a specific gene or genetic marker. While it is possible that additional advances will turn a portion of clinical medicine into a CRISPR-mediated practice of fixing genetic mutations or defective proteins, most diseases don’t seem amenable to genetic fixes. Rather, genetic information has just become one more set of data to be collected and aggregated. You’re still the same Midwestern salesman, but now your medical record also features specific gene markers, each of which may point to an increased chance for certain complications or potential treatments. In short, even with genetic sequencing, the modern practice of clinical medicine remains firmly entrenched in making inferences from aggregated data.
In many cases, that is a good thing: we wouldn’t have been able to learn as much from individual case reports about the risk factors for complicated conditions like heart disease or the relatively rare carcinogenic effects of particular substances like bisphenol A (BPA), tobacco, or aflatoxins. Most therapies don’t work for everyone with a disease, but it is entirely reasonable to first try one that performed better on average in a clinical trial — say, taking a particular statin for hypertension, even if no one knows if that particular drug will actually reduce your particular chance of disease. Physicians may try to help patients navigate all the existing evidence, but that evidence can’t substitute for clinical judgement about your singular case. Verified medical evidence in the 21st century is always aggregated and statistically validated, so anything that applies to a singular person would by definition not be evidence-based.
What does this transformation mean for us in the age of COVID? After all, COVID isn’t like a heart attack. It is an infectious disease, and we are in an epidemic. Surely the tools of statistical aggregation will be particularly valuable?
They have been extraordinarily useful: the data collection procedures of public health officials have been essential for tracking cases and the calculations of epidemiologists have been widely used to predict the effects of mitigation measures. But their time in the limelight has also highlighted the limits of public health statistics. Claims of “personal responsibility,” for example, cut against the use of aggregated data. Nationwide data is often easy to dismiss at the local level: the disease may be a problem somewhere, but not in my neighborhood. Even successes, like the attempt to “flatten the curve” of case counts in spring 2020, don’t always beget more successes, as subsequent “surges” would show. Nevertheless, if risk in the aggregate is not a perfect estimator of individual risk, it is certainly better than having no sense of risk at all.
The rapid development of vaccines and treatments suggests the trade-off was likely worth it: in large measure we’re no longer as subject to the whims of infectious disease as our predecessors were in the 1918 influenza pandemic. The use of novel mRNA technology and genetic sequencing of the virus helped make the record-setting vaccine development possible. Tellingly, we still only knew those vaccines were effective after aggregating data from the experience of thousands of people. Any question about the vaccines those large trials weren’t designed to address, moreover, has no good answer.
All this said, an irony remains inescapable. We live in a period in which models are better, statistical measures and data more accessible and reliable, treatments more powerful, and yet all of this has failed to stave off one of the oldest of human threats: an epidemic. As historians have long known, epidemics foreground existing fissures and exploit them. One such fissure is precisely this difference between aggregated and individual risk. In previous epidemics, it was possible to make claims about public health needs on the basis of large-scale statistical data while still insisting that the individual was in some sense responsible for getting sick. As the historian Charles Rosenberg influentially argued in the 1980s — writing in the midst of the AIDS crisis — epidemics usually had set dramaturgical forms. Part of the drama of any epidemic centered, he argued, on people’s reliance on preexisting moral convictions and spiritual assumptions to explain who might get sick or recover. These value judgments were always mediated by an understanding of physiological processes and environmental conditions, but the susceptibility of a particular individual was never seen as simply random or idiosyncratic. Rather, physiological processes and environmental conditions “constituted a framework within which moral and social assumptions could be at once expressed and legitimated.” Quarantines and sanitary brigades could do their work for the collective, in other words, but the individual could still be said to bear responsibility for her own actions and health since any disease had a range of susceptibility.
This distinction between individual and collective risk has largely been lost. Educated Americans are no longer supposed to say people get a disease because they deserve to, or that they survive it because they’re responsible in their habits, or that it was definitely this particular behavior or that particular pathogen which caused their infection. By the laws of modern medicine, we can only speak in generalizations. We’ve largely given up the goal of being able to explain why one person gets sick and another doesn’t, or why a drug works in one case and not in another. Those were always difficult and contested judgments but at least there was once the hope that eventually we’d know enough to explain things on an individual level.
Statistically validated medicine effectively prohibits knowledge of specific cases and consequently experts don’t try.  They emphasize instead that the evidence suggests this vaccine is more effective on average; or that, say, a pandemic gathering of seven is safe but 11 is unacceptable. It is not simply a failure in risk communication, though that has certainly been a part of the problem. It is rather an essential tension between the way experts know about matters of health and the way people’s own thinking about disease remains stubbornly individualized: I want to know what will keep my family safe or heal my friend. This tension has been easily exploited during the pandemic. If statistically validated medicine can’t tell me who will get sick or recover, the argument goes, why should I trust it enough to make individual sacrifices based on it?
As we dig through the rubble of COVID-19 in the coming months and years, there will be many ugly things to pore over, from raw inequalities in access to health care to the power of a handful of individuals to undermine public health advice. But we will also need to grapple with how changes in medicine in the 20th century — the rationalization of how we know about medical causes and effects on the basis of aggregated, statistically analyzed data — have perhaps rendered us more vulnerable, in a sense. We know more about how health and disease work in general but our inability to speak in specific terms makes it all too easy for experts to be ignored. The ultimate irony of our COVID era may be that medical researchers used measures of aggregation to rapidly identify successful treatments and vaccines even as those statistics failed to do the very thing they were originally designed to do: help the public mitigate the disasterous consequences of an epidemic.
 That’s not to say that statisticians haven’t tried to make their field — essentially defined by thinking in distributions, in aggregate — more relevant for individual situations, and indeed there are tools (e.g., n-of-1 trials and other modern causal inference methods) built around making these claims more precise. But it remains unavoidable that even if the effect of a specific treatment on an individual can be observed, we can never observe the effects of two different treatments on an individual at the same time, which is of course what you’d ideally want to do if you want to know which specific treatment works for that particular individual given her particular state.