advertisement-vertical Download Proto magazine app
Social Icons
The timing of the next flu outbreak // unrecognized drug complications // unidentified genetic predispositions to deadly diseases // and other trends hiding in millions of patient records.

Digital Gold

By Linda Keslar // Photo Illustrations by Bartholomew Cooke // Summer 2009
icon-pdfpdf icon-printprint
data mining

Bartholomew Cooke

The triage station in a hospital emergency room is the last place any ill or injured patient wants to linger. It’s frustrating to be waylaid on the way to urgent treatment so that a nurse can ask about symptoms and biographical information and type the answers into a computer. But in many Massachusetts hospitals, at least, the value of that annoying process goes beyond determining which patients will be seen first. It also establishes an electronic health file that ER physicians will supplement with notes and test results. Sometime that day, each patient’s anonymized complaint, diagnosis and selected demographic information—gender, age, zip code—are transmitted into a statewide database, the Automated Epidemiological Geotemporal Integrated Surveillance system, or AEGIS.

What happens next is something known as data mining—searching caches of information for hidden patterns. For example, using a technique called natural language processing, AEGIS scans nurses’ triage notes for such key words as cough and fever, and the computer automatically categorizes chief complaints according to several broad medical conditions. Vomiting or abdominal pain is flagged as gastrointestinal illness, while other symptoms may be placed with respiratory or neurological problems. The AEGIS software then conducts a trend analysis of the data, comparing it with information about hundreds of thousands of past ER visits across the state. Against that backdrop, if there’s an unusual pattern, it should quickly become obvious.

“We’ve made computer models of all the historical data so we can predict how many people should be coming into every emergency department on a particular day, give or take 7%,” says Kenneth Mandl, an attending emergency medicine physician at Children’s Hospital Boston and one of the researchers who developed AEGIS. “If there’s a sudden surge of traffic, we see it, and we know something unusual is happening.” If a problem is detected, public health officials and hospital administrators are alerted by e-mail, voice mail or text message. The warning gives hospitals a chance to increase staffing, stock up on appropriate supplies and free up beds.

The state deployed the system six years ago to help officials watch for disease outbreaks and bioterror attacks. “We’re able to mine data that the health system produces and create a whole new use for it,” Mandl says. “Instead of the information just sitting there in a computer, we’ve found ways to make it informative in a much larger way about the population’s health.”

One dividend of this approach came from a study by Mandl and John Brownstein, an epidemiologist at Children’s, that used AEGIS-generated data to analyze emergency department patterns in six hospitals from 2000 through 2004. They discovered that a spike in respiratory illness among preschoolers typically preceded by four to five weeks a rise in influenza-related deaths among the elderly. This suggested that young children helped spread the flu, which kills some 36,000 Americans annually and leads to about 200,000 hospitalizations. Spurred in part by this finding, the Centers for Disease Control and Prevention now recommends that preschoolers receive flu shots.

Unfortunately, most data that could reveal such patterns remains locked away. Although many emergency rooms around the country record information electronically, it’s rare for those systems to mesh even with other electronic networks within the same hospital, let alone with similar systems in other institutions or state health departments. Even though it has been almost 20 years since the Institute of Medicine identified electronic records as an essential health care technology, few hospitals have aggregated all their patient information electronically. In fact, in a recent survey just 2% of more than 3,000 U.S. hospitals said they’ve completed the switch from paper medical charts to electronic systems.

The Obama Administration, whose $787 billion stimulus package includes more than $19 billion to speed the transition to electronic health records, expects a big return on that investment, estimating that improved information systems could save the health care system as much as $80 billion a year and provide a range of other benefits. Electronic records provide a patient’s primary care physician as well as specialists and other hospital personnel easy access to information, helping to flag possible drug and allergy interactions, facilitate accurate claims processing and monitor quality-of-care criteria—making sure, for example, that follow-up visits are scheduled and a patient receives all recommended care.

But health care’s belated push to join the information age could have beneficial consequences that go well beyond those normally associated with computerized record-keeping. The results of pilot projects in places where electronic records are the norm hint at what could be learned through widespread mining of medical data. As records become more sophisticated—adding information from genetic testing, advanced imaging technology and other sources—they could aid in population studies measuring predispositions to disease. Scanning records to see which patients are taking which prescription drugs might also help identify medications that, though they’ve passed muster in the relatively small-scale trials required for Food and Drug Administration approval, turn out to have harmful effects when prescribed to millions of patients. But achieving such benefits depends not only on digitizing records across the country but also on getting hospitals and physicians to agree about what should be in a patient’s electronic file and deciding how to use this data without infringing on patient privacy. Only then will a database of information gleaned from doctor-patient encounters present itself as a rich source of medical innovation.

previous // next
icon-pdfpdf icon-printprint

Signs of Trouble

AEGIS system

Fed 10,000 to 15,000 megabytes of data per day, the AEGIS system cranks out a real-time dashboard that Massachusetts public health officials monitor online.


1. “Toward a National Framework for the Secondary Use of Health Data,” by Charles Safran, Meryl Bloomrosen, W. Edward Hammond et al., Journal of the American Medical Informatics Association, January/February 2007. An extensive analysis of the widespread use of personal health data for research and commercial applications and of the need for coherent standards to protect individual data.

2. “Characterization of Patients Who Suffer Asthma Exacerbations Using Data Extracted From Electronic Medical Records,” by Blanca E. Himes, Isaac S. Kohane, Marco F. Ramoni and Scott T. Weiss, American Medical Informatics Annual Symposium Proceedings, 2008. The authors discuss the computational methods they devised to mine data from electronic medical records, finding that age, race, smoking history and weight were significant predictors of asthma patient hospitalization rates.

“Identifying Pediatric Age Groups for Influenza Vaccination Using a Real-Time Regional Surveillance System,” by John Brownstein, Ken Kleinman and Kenneth Mandl, American Journal of Epidemiology, August 2005. The story of how a real-time population health monitoring system identified three- to four-year-olds as an age group that develops influenza earliest, a discovery that fueled a strategy to vaccinate preschoolers to help prevent flu deaths among older people.

Protomag on Facebook Protomag on Twitter