How Crowdsourcing is Changing Science

How the power of crowds is changing what science can do. It may also change what science is.

Boston GlobeNovember 2011

At the end of the 19th century, a team of British archeologists happened upon what is now one of the world’s most treasured trash dumps. The site, situated west of the main course of the Nile, about five days journey south of Memphis, lay near the city of Oxyrhynchus. Garbage mounds are always a sweet target for those interested in the past, but what made the Oxyrhynchus dump special was its exceptional dryness. The water table lay deep; it never rained. And this meant that the 2,000-year-old papyrus in the mounds, and the text inscribed on it, were remarkably well preserved.

Eventually some half a million pieces of papyrus were drawn from the desert and shipped back to Oxford University, where generations of scholars have been painstakingly transcribing and translating them. The manuscripts are rich, fascinating, and varied. The texts include lost comedies by the great Athenian playwright Menander, and the controversial Gospel of Thomas, along with glimpses of daily life — personal notes, receipts for the purchase of donkeys and dates — and the occasional scrap of sex magic.

The pace, however, has been glacial. After a hundred-plus years, scholars have been able to work through only about 15 percent of the collection. The finish line appeared to lie centuries in the future.

But a few months ago, the papyrologists tried something bold. They put up a website, called Ancient Lives, with a game that allowed members of the public to help transcribe the ancient Greek at home by identifying images from the papyrus. Help began pouring in. In the short time the site has been running, people have contributed 4 million transcriptions. They have helped identify Thucydides, Aristophanes, Plutarch’s “On the Cleverness of Animals,” and more.

Ancient Lives is part of a new approach to the conduct of modern scholarship, called crowd science or citizen science. The idea is to unlock thorny research projects by tapping the time and enthusiasm of the general public. In just the last few years, crowd science projects have generated notable contributions to fields as disparate as ecology, AIDS research, and astronomy. The approach has already accelerated research in a handful of specialized fields. And it may also accomplish something else: breaking down some of the old divisions between the highly educated mandarins of the academy and the curious amateurs out in the world.

“It may seem intimidating when we say you are going to help transcribe ancient Greek papyri, but it’s all about pattern recognition, and the brain excels at pattern recognition,” says James Brusuelas, an Oxford classicist who is part of the Ancient Lives team. “The reaction has been fantastic.”

One reason for the sudden turn to crowd science is that it offers an imaginative answer to a central problem of 21st-century science: too much information. Oxford’s scholars had an overwhelming load of work given them, in the form of a desert trove. More often, though, scientists are themselves creating floods of data that they simply don’t have the hours to interpret. Every night, robotic telescopes relentlessly track the sky, pouring terabytes of images into hard drive farms. From biological labs come rivers of genetic code. And in many other fields — from high energy physics to environmental science — researchers are puzzling over how to handle the sudden embarrassment of riches.

For now, the new citizen science has touched only the tiniest fraction of the research conducted around the world. But its early successes, which have shocked even the architects of the approach, suggest that over time pro-am collaborations hold the potential to alter the landscape of science in important ways, harnessing countless able brains to do work that was once the province of a few overwhelmed experts. And as it does, it also offers an uncomfortable insight: There are ways that the structure of modern science may actually be limiting what we can learn.

The idea of recruiting amateur scientists has roots that go back at least a century. In 1900, in the early days of the American conservation movement, ornithologist Frank Chapman organized a Christmas bird census. Teams of avid birders collected observations from Toronto to Baldwin, La.: the American black duck, the red-breasted nuthatch, the common grackle, and 86 other species. It was an unprecedented one-day data dump. The Christmas Bird Count has become an Audubon tradition, with about 60,000 people going out every year, and the data it has generated through the years have proved invaluable to researchers.

Today there are firefly counts, herring counts, and ladybug counts. One can help track spiders or bats or coral reefs. A new iPhone app called Noah (for Networked Organisms and Habitats) allows users to snap pictures of species they come across and share the information with researchers and others. A similar British effort, called iSpot, led to the discovery of two species that had not been recorded before in England, according to a report by the BBC. Some projects use networks of observers to monitor the timing of natural events, such as the arrival of hummingbirds, or the budding of flowers, which provide information on the planet’s changing climate. None of these projects would be possible without countless amateurs willing to serve as devoted foot soldiers across the planet.

The advent of the Internet has also opened up a new possibility: that the interested public could offer scholars more than help gathering data. In the best-known early example, they offered up their computers: 1999 saw the launch of SETI@home, an example of “distributed computing” in which volunteers downloaded software so their idling computers could help crunch radio-telescope data for signs of alien life.

More recently, though, has come a truly fascinating turn: the move from people volunteering their computers’ down time, to people volunteering their brains’ down time — from distributed computing to distributed thinking. Oxford University astronomer Chris Lintott says that his own involvement dates back to a 2007 conversation he had over a pint at the Royal Oak, a traditional watering hole for Oxford astronomers where tables are crammed into small rooms with old fireplaces and ancient wood beams. A student, Kevin Schawinski, had recently finished the exhausting task of categorizing 50,000 galaxy images for a project. As they spoke, though, it became clear that that wasn’t nearly enough: What the project really required to succeed was to categorize a million galaxies.

“One look at Kevin’s face,” says Lintott, “suggested we should find an alternative method.”

This led them to create Galaxy Zoo in 2007. The site provided a simple tutorial that trained people to classify galaxies by their appearance, and then served up images that astronomers had not yet categorized. Galaxy Zoo was so popular that soon after it launched, the servers literally caught fire from all the activity. A schoolteacher sitting in an apartment in the southeast of the Netherlands discovered a strange green cloud that had never been observed before. The astronomical data from the project have been used in a growing list of scientific publications.

The approach was so successful that Lintott and the other organizers decided to expand it to other areas, including solar explosions and climate change, under the name Zooniverse. (The Ancient Lives project uses the Zooniverse website.) Meanwhile, many other scientists and organizations are jumping in: One popular website, scienceforcitizens.net, lists more than 400 projects, and the site’s founder says she expects to hit 1,000 within a year.

What marks this as an important milestone in the history of science is the new way it harnesses the power of the mind. There are many tasks that are beyond the grasp of even today’s computers, particularly those which involve interpreting complex images. Like identifying cancer cells. Or categorizing galaxies. Or picking out letters of ancient Greek, written in a faded ink with a fast, messy hand, without breaks between words. The Internet, it turns out, is a brilliant way to feed those problems into an array of the planet’s true supercomputers — human brains.

A recent discovery highlights the sophistication of the work volunteers can do. Biologists are keenly interested in the three-dimensional shapes assumed by protein molecules inside the human body. Proteins are intimately involved in many aspects of life, but they fold into shapes that can be very difficult to predict, even given their precise chemical makeup. Protein-folding is a roadblock that holds up research into many diseases.

So a team of scientists at the University of Washington created a game called FoldIt, which gives players an image of a protein molecule and video game-like tools for folding the molecule. As the energy required to maintain the molecule in a particular shape drops — meaning it’s closer to nature’s solution — a player’s score increases. FoldIt is a potentially addictive game that requires excellent spatial reasoning. Some players excelled at it — indeed, some became whizzes, and the researchers put their skills to work on unsolved problems. In September, the scientists announced that a team of its players had deciphered the folding of a protein important in AIDS research.

In a paper describing the result for Nature Structural and Molecular Biology, the scientists argued to their colleagues that a line had been crossed: “Although much attention has recently been given to the potential of crowdsourcing and game playing, this is the first instance we are aware of in which online gamers solved a longstanding scientific problem.”

FoldIt is the most impressive demonstration yet that the public can make genuine contributions to scientific projects. But its success also stands as a potent critique of the way that the scientific enterprise is currently organized.

Science is, for the most part, a closed society organized into little fiefdoms of highly trained specialists, which means only a few minds engage with any given problem. Before FoldIt, for example, a problem in protein folding was the exclusive province of a relatively small number of experts — even though, it is now clear, there are real contributions to be made by 13-year-old video gamers.

The system is shaped in part by the force of tradition, but the larger challenge is that most scientific data is proprietary. A scientist works long and hard to generate original data, and then expects to reap the reward in the form of publishing the first research paper to describe some new phenomenon. She is not going to want share this data with others, particularly strangers, any more than say, an investigative reporter would want to share his notes before a story has been written. Harnessing 1,000 people requires sending your data out into the world — something that science is loath to do. The scientist’s interest in keeping things private and getting credit, in other words, is directly opposed to society’s interest in tackling some problems with a hive of the best minds.

There are exceptions, such as large astronomical and biological data sets that are available for anyone to work with. But the last 10 years have seen a boom in technology that allows large numbers of people to do amazing, cooperative things with information, and the scientific establishment has taken only baby steps toward figuring out ways to share it productively, according to Michael Nielsen, a former theoretical physicist and author of “Reinventing Discovery: The New Era of Networked Science.”

To encourage this shift, the federal government, which funds the lion’s share of the country’s research, has been pressuring scientists to work more cooperatively, and share more of what they find faster. And there is a nascent effort within academia to identify ways that scientists might be recognized for their contributions to the community as a whole, beyond the publication of their individual discoveries.

“It is essential that scientists be rewarded when they share,” says Nielsen.

It’s a difficult problem, and Nielsen says he expects the real rewards of networked science to be tallied over decades, not years. Even if science becomes more open, there are also practical limitations: It takes a certain brilliance, and a lot of work, to recognize problems that can be shared with a crowd, and set up the systems needed for strangers to work together productively. It is not always clear when this tactic will move a project forward, or slow it down.

With time, though, one might expect a new type of scientist to emerge: one who is especially adept at recognizing problems, and designing projects, that tap the brilliance of a dispersed and motley team, whoever they may be.

Science is driven forward by discovery, and we appear to stand at the beginning of a democratization of discovery. An ordinary person can be the one who realizes that a long arm of a protein probably tucks itself just so; a woman who never went to college can provide the crucial transcription that reveals a spidery script to be a love poem from 2,000 years in the past. Nobody can say where the movement will go, but among the new pioneers of crowd science, there is a palpable sense that they have just happened upon a powerful, poorly understood new resource.

“We have used,” says Lintott, “just a tiny fraction of the human attention span that goes into an episode of Jerry Springer.”

Image courtesy Andy Martin