Susan Phillips / StateImpact PA
With every new administration, government held information disappears. Digital archivists know this. They’ve worked in the past to preserve Bush Administration data when Obama was elected. Sometimes, it’s just a matter of budget priorities. Funds no longer exist to keep up a website. But with the incoming Trump administration, some scientists worry key environmental research will go missing because of political reasons. So researchers from across the country and Canada gathered in Philadelphia last weekend to copy key data.
Imagine yourself a high school science teacher needing the most updated teaching tool for a lesson on climate change, you click on the EPA’s link for climate change lesson plans – and it says “page not found.”
Or, you’re a parent of small children with asthma and you’re moving to a new town and want to know air quality data for different neighborhoods, again “page not found.”
These are examples of what people like Michelle Murphy fear will disappear under the Trump administration.
“All those questions are wrapped up together and it matters because it’s about what you breathe, and what you eat and what you drink,” she said. “It’s about your children. It’s about your community.”
Murphy is an expert in technology science, who works at the University of Toronto. She flew all the way to Philadelphia last weekend to participate in UPenn’s Data Refuge hackathon. She also helped organize a similar hackathon in Toronto. Murphy says part of her motivation stems from the research she’s done on the loss of EPA data under the Reagan Administration, which she says led to agency staffers unionizing in the 1980′s.
“So there’s a long history of these kind of struggles in these agencies trying to defend scientific integrity and evidence based research,” said Murphy.
Murphy emphasized that data is not just threatened by Republican administrations. The data preservation organization Internet Archive, in cooperation with the Federal government’s own publishing office, preserved more than 3-thousand items from the George W. Bush administration before Obama took office. But whether or not Murphy’s worst fears regarding the Trump administration and data preservation are realized, she says there is already a problem with preserving online data that needs to be addressed. Assistant director for digital scholarship at Penn, Laurie Allen agrees.
“The internet is a terribly unstable way to keep information available,” said Allen. “A huge number of references to websites no longer work.”
Speaking from the top floor of Penn’s library on Saturday, Allen was surrounded by about 100 people – archivists, librarians, tech workers and students – all hunched over computers, writing on white-boards, and racing to preserve climate data from National Oceanographic and Atmospheric Administration.
Allen, along with Penn’s director of environmental humanities Bethany Wiggin, helped organize the Data Refuge hackathon at Penn.
Wiggin says the cost of hosting all this data on private servers could run into the hundreds of thousands of dollars. Right now, the Data Refuge organizers are dependent on an anonymous donor.
As a librarian, Allen says she wants to make sure the information is stored properly.
“So the Data Refuge site right now is the holding area for these bags of files that we’re getting from federal websites,” she said.
Not all the volunteers were high-tech hackers. Kevin Burke is a grad student studying anthropology at Penn. He says his tech skills are pretty minimal, and he likely won’t use any of this data in his research. But he sees the effort as a potential love letter to the future.
“When you’re an historian and you’re working in an archive, sometimes you pull out a folder and you just think ‘oh man the foresight that somebody had to stick that in a folder and save that for me’,” he said. “So now I can pull it out and make sense of something and I think that’s really powerful.”
The Trump administration has not said it plans to wipe the data slates clean. But the Data Refuge effort has taken on even greater urgency this week after the website InsideEPA.com reported current EPA employees predicting some of Obama’s non-regulatory climate data “will not survive the first day,” setting off a twitter storm among the data preservationists.
The Upenn hackathon resulted in 1.5 terabytes of data copied to private servers.