Data Diving Investigative journalism for social change TechSoup - May 28, 2015 In the digital age, the investigative journalist is part gumshoe and part hacker. With the radical transparency movement flourishing, more and more data is available to those who know where to look. For journalists, the trick often lies in figuring out the best way to analyze that information, make connections, and present it to the public. The Organized Crime and Corruption Reporting ProjectOne group on the cutting edge of investigative reporting is the Organized Crime and Corruption Reporting Project (OCCRP). Led by executive director Paul Radu, the group is a nonprofit media consortium operating in Eastern Europe, the Caucasus, Central Asia, and Central America.The group is made up of reporters, artists, and hackers who collaborate across borders to root out corruption and organized crime. Founded in 2007, the OCCRP has garnered numerous awards for its groundbreaking investigations, some of which read like the plot of a Hollywood movie. Recently, Radu and his group mapped out the criminal network of a group of international hired assassins. The project began with the arrest of a Moldovan national, Vitalie Proca, who was apprehended in Moscow as a suspect in an attempt on the life of a Russian businessman in London. OCCRP staff members used border logs to track Proca's movement over several years. They then traced the registration of the cars he was driving to several Romanian commercial companies. As it turned out, some people involved in these companies were also charged with murders. Using just public information, Radu and the group discovered that Proca was allegedly a member of a gang of hired killers, responsible for at least a half-dozen deaths, with ties to a mafia network.In the assassination case, as in many of the OCCRP's investigations, virtually all the data they used was public. Radu said that the group gathers its data by scraping it from websites and online resources and submitting public information requests."Our goal is to mine as much data as possible and then put the puzzle together," he says.Radu also speaks of the need to "visualize" the data. Some stories are so complex that rendering them in text makes them difficult for a reader to understand. Graphically rendered data enables readers to better grasp the connections between different parties while making the text lighter and more readable.To help journalists and activists translate their complex stories in easily readable visual language, the OCCRP has created Visual Investigative Scenarios, a data visualization platform. This platform provides HTML5 templates that can be exported for use online or in print, allowing OCCRP "to take the burden off the text by adding a visual element," says Radu.In an investigative report, a journalist will often use only a small fraction of the data that has been collected. To prevent any of his or her collected data from being wasted, Radu and his organization often turn it into a publicly accessible database. This allows other parties to conduct their own analysis and make their own connections. Radu calls this a kind of "investigative crowdsourcing."In keeping with the group's collaborative mission, Radu makes an effort to make these same tools available to others. He helped create the Investigative Dashboard, a tool that allows investigators to look up people and companies as well as access more than 300 databases. This enables them to "follow the money."He also makes a point of spreading the gospel of corruption-fighting data collection and analysis by presenting at various conferences. When we spoke, he was fresh from attending a gathering of activists and banking reform advocates in Moldova. As an exercise, he gave the attendees several datasets, including a list of Moldovan citizens with companies in Romania, Switzerland, and the UK. He then asked them to identify connections between the datasets.The Center for Investigative ReportingNonprofit investigative journalism is also taking root in the United States. In Berkeley, California, the Center for Investigative Reporting (CIR) is on the forefront of groups using large amounts of data to power investigations. Founded in 1977, CIR produces in-house content with a staff of more than 50 investigative journalists, editors, and producers. Its reports are then distributed through its website and in a range of publications across multiple mediums. Print articles regularly appear in The Washington Post and the Los Angeles Times, radio stories on NPR, and video segments on ABC and on PBS's "FRONTLINE."The CIR has produced series on corrupt US Customs and Border Protection officials, wasteful charities, the forced sterilization of women in prison, the marketing of junk food to children, and a mounting backlog of unprocessed veterans benefits at the Department of Veterans Affairs (VA). A massively popular story on the fate of the Navy SEAL who killed Osama Bin Laden appeared in Esquire.The fuel that powers CIR's reporting is data. Every day, the center receives reams and reams of data, which arrive in the form of responses to public records requests, information pulled from websites, and the fruits of good old-fashioned shoe-leather reporting. One of the central challenges faced by CIR journalists is figuring out how to interpret and, in turn, present this enormous mass of data to a wide audience.A case study: In August 2013, reporters at the CIR received a giant dataset that seemed to show a massive increase in the number of opiate prescriptions being written by the VA. The reporters wondered how to make sense of it.The solution? The center created its own specialized apps. These research apps can provide reporters with an excellent method of displaying data and analyzing it. For example, when reporters want to run a specific query, they can write a customized model method for the app to carry it out. Each app can be adapted to the particular needs of the reporter and the story.In the case of the opiate story, reporters began by taking the data, which was in several formats, loading it into a database, and placing it into a web framework. CIR typically favors the Django framework and the PostgreSQL database management system. Django, in particular, gives them access to such powerful coding tools as FuzzyWuzzy, which can be used for name matching, and the software library Pandas, which can be used for statistical analysis. Skillful coding of the apps also makes them more shareable, allowing readers to share specific portions of the data that are especially relevant to their city or state.When it uses data smartly, CIR helps create stories that are of both national and regional interest. By running apps off application programming interfaces, or APIs, CIR can easily share its data with regional media partners. This way, the publications can take national datasets and run their own analysis to localize them, identifying only data that's of local interest.In the case of the story about veterans and opiates, CIR shared the data with regional publications, which used it to produce their own articles. For example, the Charleston Daily Mail newspaper published an article about the abnormally high rate of opiate prescriptions at veterans' hospitals in two West Virginia counties. In another series, on the backlog of veterans' benefits, 15 different news outlets used portions of the dataset to create original stories about the struggles of veterans in their area. Often, this data was paired with interviews of local veterans.In many cases, data and interviews will buttress each other to help create a narrative. Take a recent investigation undertaken by CIR staff reporters Christina Jewett and Will Evans, in association with CNN, titled Rehab Racket, which showed evidence of a widespread fraud at taxpayer-funded drug rehabilitation programs in California.The two reporters began their story by making dozens of public records requests to the government, resulting in thousands of pages of audits and other documents, which they logged into a massive spreadsheet. This helped them identify patterns and home in on clinics with suspicious accounting, offering them clues as to which facilities to investigate. Pulling names from the audits, as well as certification and court records, the two reporters began tracking down and interviewing former counselors and patients, which sometimes led them to finding more data."You have this incredible duality," said Jewett. "On one hand you have these boxes of documents, and then, on the other, you have these fascinating people with stories to tell."When the investigation concluded, it appeared in the form of a three-part series on CNN and a slew of online articles on the CIR website, where producers strove to make a complicated story as accessible as possible. In addition to pictures and videos, the site includes infographics breaking down the data to show how the fraud occurred, the amount of money spent by the state, and the results of the investigation — which have been nothing short of impressive. Since the story broke, the state has suspended 73 clinics, cutting off their taxpayer funding, and referred 64 of them to the Department of Justice for further investigation.A commitment to finding new and exciting ways of displaying and distributing its content is part of what sets CIR apart. Nearly all of its stories come packaged with news apps, inventive graphics, interactive maps, and animated features, all designed to make the data more comprehensible. CIR, working with the New York Times, the BBC, Vice, and others, also founded the I Files, the first investigative journalism channel on YouTube. In addition, CIR teamed up with Google to host two "TechRaking" conferences that brought together journalists, technologists, and gamers.In 2012, CIR's innovations were recognized with a $1 million award from the MacArthur Foundation — a so-called "genius grant." More than perhaps anyplace else, the CIR has proven that buzzy, tweetable content and in-depth investigative journalism aren't necessarily antithetical."There's a lot we can learn from sites like Buzzfeed," said Meghann Farnsworth, CIR's senior manager for distribution and engagement. "While the content we produce is inherently different, investigative journalism can learn a lot from the successful way they package content."Radu, of the OCCRP, agrees, citing an important need to make data-driven news more accessible and enticing — even edgy. "To improve investigative reporting, we need to get rid of the dry text," he said. Image 1: Paul Radu / Organized Crime and Corruption Reporting Project / © 2015 Image 2: The Center for Investigative Reporting / © 2015 Image 3: The Center for Investigative Reporting / CC BY This work is published under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License.