Internews Center for Innovation & Learning

Internews Center for Innovation & Learning
Twitter icon
Facebook icon
RSS icon

Data journalism for the 99 percent ... how to get started

Everyone from Big Data goliaths like the World Bank, to data journalism heavyweights like The Guardian, BBC and New York Times, to software companies like ScraperWiki that clean and analyze data are getting into the data journalism training business.  These are fantastic resources in the fast-paced age of open government data initiatives, apps, maps and visualizations. 

 But we wouldn’t throw the Oxford English dictionary at somebody learning to read and it doesn’t make sense to hit aspiring data journalists in the developing world with massive financial data sets, complex, interactive data visualizations and cutting edge data analysis software in order to teach data literacy, a prerequisite for anyone interested in data journalism.  Granted, conferences, hackathons and short workshops are inspirational and great to get participants excited about data and motivated to try it themselves.  But often they return to their outlets and confront a different reality: sparse data sets, a bunch of tools that made sense during the presentation but now seem a lot trickier and questions from editors about why so much effort for one story would possibly be worth it.  It’s a question of addressing both the data and digital divide.  Resources like the Data Journalism Handbook are a place to start, but need to be adapted to the local context. 

The Internews Digital Media Center in Kenya, led by Ida Jooste, Ernest Waititu and Mark Irungu has worked to develop a formula for data journalism that works for Kenyans, the realities of their media and their data, including information available through one-year-old Kenya Open Data Initiative (KODI), a campaign aimed at making public data more accessible to ordinary Kenyans.  Key to the success of their efforts are inclusivity— bringing together everyone who has an interest in the data to work together, including journalists, civil society organizations that gather their own data, government officials that respond to FOIA requests, editors who understand data journalism and coders who are geeky and passionate enough to help these groups analyze and visualize—and patience—recognizing that teaching people how to look for public interest stories in data, scrutinize data sets for statistical relevance, form hypothesis around data and develop a strong narrative around data findings are essential skills before we even get to spreadsheets, data analysis and visualization. 

Here is a step-by-step guide for launching a data journalism initiative in your community:

Invite all the right people.  Invite journalists who specialize in areas with potential for data-driven stories including those covering the local government, finance, education or health beats.  Bring along representatives from the NGOs that gather and analyze data on those sectors.   If possible, convince local government officials who respond to data requests to attend.  Then tap into the tech community and identify developers, coders, and graphic designers who can commit to data projects.  Invite equal numbers from each group to form teams of four or five.

Hire the right trainers and facilitators.   Identify a regional leader in data journalism who is intimately familiar with the potential and limitations of locally available data or a Western journalist with extensive experience in the region.  Preferably, choose a member of the International Consortium for Investigative Journalists because of their access to resources and a wider journalism network.  Also invite a lead developer who will lead training for coders and oversee all the date visualization development.  Finally, choose a lead researcher who will design training in finding data, evaluating the integrity and completeness of data sets and presenting findings as well as help identify data sets for selected hypotheses and walk participants through the FOIA process.   These three key trainers should identify three key members for each small group: an investigative journalist to facilitate, a researcher (may be the local civil society organization or university) and coder. 

Start with Data 101.  Before jumping right in, hold a bootcamp in data and data journalism based heavily on international and local case studies.  The training should include:

  • Lead Journalism trainer: examples of good data stories, examples of misleading data journalism stories, example of data story with a strong narrative and a weak narrative, sufficient context, insufficient context, walk through of one story from data to final product.
  • Lead Researcher: examining data gathering methodology, comparing data sets over time, mashing up data sets that lead to valid and invalid conclusions, where to find data sets, how to submit FOIA requests.
  • Lead Developer: Using one of the data sets used by the research trainer, create a presentation of the same data set through a variety of infographics from very simple to interactive.  The idea is to introduce participants to the impact of different visualizations of the same data.

Choose topics then break.  The hypothesis or the data set is a chicken and the egg question.  The group should brainstorm both for topics of investigation.  The lead journalism and research trainer should steer the brainstorming session towards realistic hypothesis with a high probability of accessible data sets.  Popular ideas should be broken into broader categories (elections, healthcare, environment, etc.) and groups formed based on each category.  Groups should have a representative from each set of invitees.  In each group:

  • Investigative journalist facilitator: Lead the group in defining 3 or 4 hypotheses to be investigated.
  • Researcher: Suggest likely data sets and pair up with participants from both journalism and civil society communities to prepare FOIA requests and otherwise access datasets for each hypothesis.
  • Developer: Provide perspective in the amount and type of data needed for meaningful analysis and effective visualization.

Regroup and see what you’ve got.  Ask the facilitator from each group to send in their hypotheses and supporting data sets.  The three trainers get together and:

  • Lead Journalism trainer: Design a story-telling training including interview assignments for journalists to provide context and a face to the data story.  For example, in Kenya the team found a data set showing 40% of women give birth at home.  Set up exercises for participants to explore the reasons behind the phenomenon and evaluate the accuracy of official responses to the findings.
  • Lead Researcher: Set up data cleaning and analysis exercises for the groups.  Identify other data sets that might inform their research to ensure each group has enough data to work with.  This exercise should involve all participants.
  • Lead Developer: Set up training to dig deeper into scraping using Ruby, Python or PHP and data visualization focusing on the tools most appropriate for the available data sets.

Bring everybody back for intensive training and work session. This day or series of days is not meant to result in finished data stories, but rather ensure that each team has enough material to proceed through a small reporting grant.  Conduct trainings for journalists/civil society representatives and developers simultaneously. The research training for the whole group can follow later.  Small groups meet during the second half to work on their stories and develop a workplan for the subgrant period.

Award small grants.  Grants should cover participation of the journalism trainer, researcher and developer to assist the team through final publication.

Celebrate!  Invite all participants to a reception where each team presents their final project.  Feedback from lead journalism trainer, researcher and developer is essential.

Related blog post: