industry–from the old-school newspaperman who won’t send an email to the young, enthusiastic programming geek–and everyone in between.
In fact, if you consider yourself to be very much in between–or maybe even slightly towards the old-school side of things, this post is for you. We’ve sifted through the labyrinth of tipsheets, blog posts and (almost) exhaustive collections of all that was generously shared, referenced or demonstrated at this year’s conference to bring you some of the most useful data-driven reporting tools offerred at this year’s event.
We’ve organized it into seven broad categories:
- General CAR Tips & Best Practices from the Pros
- Research Tools
- Social Media Tools
- Data Cleaning
- Inspiration: A Small Collection of some of the Best Data-Driven Stories of 2012 (with a special emphasis on energy, education and economy stories)
- Advanced Coursework in Database Analysis, Super Stealth Spy Stuff and Web Scraping
We haven’t tested everything just yet, so if you find any of this particularly useful–or not–please do tell us about it in the comments section or via email.
That said, we hope you’ll find at least some of the collection below as useful and inspiring as we do. Enjoy!
General CAR Tips & Best Practices from the Pros:
- Dodging Disasters: a quick checklist from Liz Lucas of NICAR
- Finding Data Online with Jaimi Dowdell, NICAR/IRE trainer extraordinaire
- Inside Baseball: What Data Journalism Can Learn From Sports
- Cautionary Tips and Tales (or how to approach the data)
- Wendell Cochran’s ppt for CAR in business reporting (which, really, is relevant for just about any kind of reporting)
- The Data-Driven Story, From Launch to Presentation
- CAR on the Beat from John Diedrich, Milwaukee Journal Sentinel
- CAR on the Beat from Dan Catchpole, Yakima Herald-Republic
- Building a data and contact library on your beat, from T.L. Langford of the Houston Chronicle
- Mapping Best Practices slideshow from John Keefe of WNYC, Dave Cole of MapBox and our very own Matt Stiles–with some bonus further reading on how to know when a map is and is not the best tool for telling a particular story.
Research Tools for Investigative Reporting
- LittleSis.org: An impressive database of relationships between people in business and government. For example, if you want to do some background research on your new governor, you can use the site to find out whom he has connections with, and what corporations he has ties to. (Here are the tutorial videos.)
- FRED: Short for Federal Reserve Economic Data, FRED is an online database of more than 61,000 databases from 53 national, international public and private sources. Although it’s a product of Federal Reserve Bank of St. Louis, the database includes data beyond the Eighth Federal Reserve District. (And here is the introductory handout.)
- American Fact Finder: The go-to source for all kinds of data including American Community Survey, American Housing Survey, American Economic Surveys and Decennial Census. (With a handy-dandy instructions for how to use it.)
- FOIA Machine: Useful mostly for organizing large investigative projects that involve several FOIA requests and/or collaborators, this online tool helps you submit and keep track of who you’ve contacted and when, and when it’s time to bug them again. It’s creators are hoping that it will eventually, if used widely enough, help journalists and other open government advocates pool their resources and call out contacts and agencies who aren’t playing nice.
- The Overview Project: Gives you a way to upload and sift through piles of digital documents (e.g., pdfs of emails you got from a FOIA request) to find topics and subtopics and help you organize and prioritize your work.
- A slew of other tools compiled by IRE’s Mark Horvit and Jaimi Dowdell.
Social Media Tools for Investigative Reporting
An IRE blog post includes the full presentations and handouts from Doug Haddix and Mandy Jenkins’ presentations. Here are some of our favorites:
- Ban.jo: A mobile app that helps you see the exact location of updates and tweets.
- Geofeedia: A paid service that searches and monitors social media by location. It’s useful when you need user-generated content for an event or topic. With an introduction and tutorial video.
- Mappeo: A YouTube geo-search tool that allows you to see videos posted in a given area. It’s useful when you need user-generated content for an event or topic.
- iWitness: Search social media content by time and place; also useful when you need user-generated content for an event or topic.
- AllMyTweets: Search for people’s tweets and view them all in one page. It could be useful when you need to analyze someone’s tweets quickly or to sift through many tweets on a specific topic.
- foller.me: A Twitter analytics tool that examines topics, hashtags, mentions and active time periods of Twitter accounts.
- Social Mention: A tool to examine the social impact of your account, your topic and your event. (example: StateImpact)
- While we’re on social media, Mandy Jenkins’ Using Social Media as a Branding and Journalism Tool is also pretty great.
- Bulletproofing Your Data: a guide to data cleaning and integrity checks (again, by Jennifer Lafleur of ProPublica)
- How to use Google Refine (now Open Refine) to clean up dirty data. Tom Meagher of Digital First Media gave a great presentation (with helpful presenter notes) and has more tips available on his blog.
- Excel I: Formatting, Sorting and Filtering, from Linda Johnson of the Lexington Herald-Leader
- Excel II: Rates and Ratios, from Denise Malan, Corpus-Christi Caller-Times
- Advanced Excel: Date Functions, Text or String Functions, If-Then Statements, Ranks, LookUp Tables, Making sense of errors. Also includes this practice data set to play with while learning. From MaryJo Webster, St. Paul Pioneer Press
- Finding stories with Excel, from Kate Martin of the Skagit Valley Herald, Grant Smith of the Commercial Review and Megan Luther of IRE, reviews Excel basics and goes over how to find some basic stories on budgets, crime, property assessments and more!
Data Viz, Mapping & Timeline Tools that Play Well with WordPress
- Google Fusion Tables & Google Charts tool are our favorites (in case you didn’t notice).
- Tilemill: Another great mapping tool that is only ever so slightly more complicated than Google.
- Timeline.js: It’s got a navigation element on the bottom, and room for large photos and big quotes. You can embed pretty much whatever you want: videos, Flickr photos, tweets, etc. In order to make that, put the text, data and links in a Google Spreadsheet. (How-to)
Other Data Viz, Mapping & Timeline Tools
(that we have yet to explore but want to soon)
- Tableau Public: All sorts of mapping and data visualization tools.
- infogr.am: A data visualization and interactive graphics tool.
- Adobe Edge Animate: A free tool for creating interactive content.
- NodeXL: an Excel add-on that does social networking analysis and stuff!
- Vertical Timeline: It’s a more Facebook-style approach to show events over time. You can show photos and text in this timeline (with John Keefe of WNYC’s instructions on how to use it).
- Timeline Setter: This tool takes a slideshow approach with a navigation bar on top to show a series of events at a glance and has “cards” below that can be customized to include multiple sources of media, including photos, videos, maps, document embeds, etc. It can also handle multiple parallel event series on the same timeline! What?!?
Inspiration: 2012 Data-Driven Reporting Rock Stars
(a.k.a. Things to possibly try at home, provided to you by the incomparable Mark Horvit & Megan Luther of IRE. For much more where these came from, check out their stellar Year in CAR presentation)
- FINDINGS: Income inequality has increased in 49 of 50 states since 1989. The poverty rate increased in 43 states, most sharply in Nevada.
- DATA: Census and Current Population Survey from US Census.
- Himanshu Ojha of Reuters, along with Paul Overberg of USA Today and Robert Gebeloff of the New York Times also presented their work on income inequality, and warned the audience of some of the quirks of income data. They shared this tipsheet and these slides. And these ones.
- FINDINGS: In ten years, remote sensors detected only 5 percent of the nation’s pipeline spills, the general public reported 22 percent and pipeline company employees at the scenes of accidents reported 62 percent.
- DATA: Decade of data from Pipeline and Hazardous Materials Safety Administration (PHMSA).
- FINDINGS: Since January 2005, the Texas has rejected just five of Chesapeake’s 1,628 requests for fracking property without permission or payment.
- DATA: Texas Railroad Commission data, investment research from Morningstar Inc.
- FINDINGS: About 200 school districts around the country had high concentrations of suspect test scores that follow a pattern of both unusually high and unusually low scores similar to Atlanta. For these school systems, the odds of so many suspicious score changes occurring in a single district due to chance alone were extraordinarily low — ranging from 1 in 1,000 to worse than 1 in 1 trillion.
- DATA: Reading and math test results from all 50 states and DC for all years for grades 3 through 8.
- FINDINGS: Nearly 32,000 K-8 grade students — or roughly 1 in 8 — missed four weeks or more of class during the 2010-11 year.
- The paper found striking racial disparities in elementary attendance. Youth with learning and emotional disabilities also missed far more school.
- DATA: Internal student-level attendance data from the Chicago Public Schools.
Other Local/State Accountability Stories to learn from:
- This presentation, from Tim Eberly of the Atlanta Journal-Constitution, outlines how he went about hisinvestigation into how daycares are managed in Atlanta, which discovered non-payment of fines, subsidies to shoddy daycares, etc. The tips might also be applicable to investigations of other regulatory agencies.
- This presentation, from Josh Sweigart of the Dayton Daily News, lists other datasets to check for fund mismanagement in local government.
- Mary Jo Webster of Pioneer Press shared her experiences and tips for investigating public pensions.
Advanced Coursework: Database Analysis & GIS
- Getting Started with MySQL Check out this FREE Access-like tool that works on Macs. Hallelujah!
- And it’s an ArcGIS-like cousin, QGIS! Whaa?!?
Advanced Coursework: Super Stealth Spy Stuff
- Covert Reporting: Using Technology to Cover Your Tracks, from Paula Levigne, ESPN
Very Advanced Coursework: Web Scraping
- Part One: Web Scraping presentation slides (Sean Sposito, American Banker). These slides give a simple introduction to html, http and xpath for web scraping.
- Part Two: Web Scraping with GoogleDocs presentation slides, from Acton H. Gorton, University of Illinois and Sean Sposito, American Banker
- Another tool that looks somewhat learnable is Scrape screen scraper Chrome extension. Journalist Jens Finnäs wrote a tutorial for it on Dataists.
- Helium Scraper: A tool that extracts website data into structured formats such as CSV and XML, and has these tutorial videos.
- CometDocs: This tool is not really about web scraping, but it’s about scraping your data and information from a PDF file to Word and Excel documents.
- Tablua: Another tool to liberate data tables trapped in evil PDFs.
- XPDF: Another PDF converting tool. (What’s different? It supports multiple languages.)
- Table Capture: Chrome extension grabs table HTML and drops it into a Google doc.
Shwew. Enjoy! And happy digging and number-crunching and story telling!