By Andy Dickinson in data journalism — Oct 3, 2017

Two fundamentals that define good data journalism

Defining data journalism is a hostage to fortune but as I start teaching a data journalism module I’ve boiled it down to two things visible methodology and data.

I’m teaching a module on Data Journalism to second year undergraduates this year. It’s not the first time we’ve done that at the university. A few years ago three colleagues of mine, Francois Nel, Megan Knight and Mark Porter ran a data journalism module which worked in partnership with the local paper. I’ve also been tormenting the students with elements of data journalism and computational journalism across all four years of our journalism courses.

There are a couple of things I wanted to do specifically with this data journalism module (over and above the required aims and outcomes). The first thing was, right from the start, to frame data journalism as very much a ‘live conversation’. It’s exciting, and rare these days, that students can dive into a area of journalism and not feel they are treading on the toes of an existing conversation. The second thing was to try and get them thinking about the ideological underpinnings of data journalism.

Data journalism as a discourse borrows most heavily and liberally from the vocational underpinnings of journalism — the demand of journalism to serve the public and hold to account that John Snow and others have talked about. But it also draws on the rigour of science, the discipline of code, design thinking, narrative and social change; anything to bring shape, structure and identity. This is often a good thing, especially for journalism, where new ideas are few and far between and it takes a lot to challenge the orthodoxy. Perhaps that’s why data journalism is seen as an indicator for prosperous media companies. But it’s also a bad thing when it’s done uncritically. I’ve written lots about how I think data journalism borrows the concept of open for its own purposes for example. Often much of the value of data journalism seems implied.

The fluid nature of data journalism discussion makes it difficult to identify “schools “of data journalism thought — I don’t think there’s a bloomsbury group of data journalism yet!*- but there are attempts to codify it. Perhaps the most recent (and best) is Paul Bradshaw’s look at 10 principles for data journalism in its second decade. It’s a set of principles I can get behind 100% and it’s a great starting point for the ideological discussion I want the students to have.

That said, and pondering this as I put together teaching materials, I think things could be a little simpler — especially as we begin to identify and analyse good data journalism. So if there was a digitaldickinson school of data journalism I think there would be a simple defining idea…

If you can’t see, understand and ideally, interact with either of those in the piece, it may be good journalism but it’s not good data journalism.

When good journalism becomes good data journalism

Here’s two examples to make the point.

The Guardian published a piece uses Home Office data to reveal that the asylum seekers are being housed by some of the poorest councils in the UK. A story that rightly caught the eye of Government and campaigners alike. Exceptional journalism. Poor data journalism.

The problem with the piece is that, although it relies heavily on the data used it is light on the method and even lighter on the underpinning data. The data it uses is all public (there is no FOI mentioned here) and there isn’t even a link to the source let alone the source data.

Contrast that with a piece from the BBC looking at the dominance of male acts at festivals.

The BBC’s piece might be seen as frivolous, but no less a piece of journalism.

It’s a fascinating piece but the key bit for me is at the end where there is a link to find out how the story was put together**. **That’s the think that makes this great data journalism.The link takes you to a github repository for the story which includes more about the method, unpublished extras and, importantly, the raw data.

The BBC take is a full-service, all bases covered example of good data journalism; its the blue ray with special features version of the article. To be fair to the Guardian piece, they do talk a little about the ‘how’. But not on the level of the I also recognise that in these days of tight resources, not every newsroom needs to create this level of detail. But using github to store the data or even just linking to the data direct from the article is a step in the right direction — its often what the journalists would have done anyway as part of the process of putting the article together.

Making a point

I’ve picked the Guardian and BBC stories here as examples of data-driven journalism. These are two stories that put data analysis front and centre in the story. But I recognise that I’m the one calling them ‘data journalism’. I’m making a comparison to prove a point of course, but my ‘method’ aside, the point I think stands — beyond the motivations, aims and underpinning critical reasons, when the audience access the piece, without the method and the data can we really say its data journalism.

I want my data journalism students to really think about why we see data journalism as a thing that is worthy of study not just practice. Not in a fussy academic way but in a very live way. It isn’t enough to judge what is produced by the standards of journalism alone (I’m guessing the Guardian piece would tick the ‘proper journalism’ box for many). But it isn’t ‘just journalism’ and it isn’t just a process. If the underlying principles and process aren’t obvious in the content that the readers engage with, then it’s just an internal conversation. It has to be more than that.

For me ,right now, outside of the conversation, good journalism starts with a visible method and data.

*I guess if there was they would vehemently deny there was one.

What came Next: capturing some of the conversation…

It’s nice that this post has had a fair bit of interest and I thought it was work pointing out some interesting points that have come up.

Ashley Kirk, data journalist at the Daily Telegraph, disagreed with me , the Guardian piece was not “bad data journalism’.

I don't agree that the Guardian story is "poor data journalism" because it doesn't have a GitHub link for its data source/ scripts
— Ashley Kirk (@Ashley_J_Kirk) October 5, 2017

In fact, Kirk adds, the inclusion of methodology “can be gratuitous and unnecessary (for resource-tight teams)”. Kirk also makes the point that the “BBC story does need the methodology as it’s a new dataset they’ve built”. I think that’s an excellent point to consider. Do new data sets demand more emphasis on methodology compared to an existing open data set? This would be especially true to ensure the data collection was solid. That said, the Guardian piece** does combine data as the basis of its analysis and whilst I have no doubt its robust, its a new dataset.

I see an intriguing parallel with academia. In the age of fake news, maybe we need to question whether having no sources is 'good enough'.
— (((Giuseppe Sollazzo))) (@puntofisso) October 5, 2017

In response Giuseppe Sollazzo makes the point about context and the need to be even more open and transparent with our sources and process “we need to question whether having no sources is ‘good enough’” It was a similar view to that expressed by data journalist Sophie Warnes, formally of much-loved and much missed data journalism site Ampp3d. She tweeted that; “At @Ampp3d we rigorously sourced everything and it’s something I do without thinking now. So important to show workings/sources”. She goes a little further.

It really pisses me off reading something about an interesting study or interesting data, and not being able to *replicate analysis myself*
— Soph Warnes (@SophieWarnes) October 3, 2017

I think replication speaks is a level of distributed accountability thats vital in the ‘reality’ that Sollazzo alludes to. But it also speaks to a way in which the contemporary data journalism community operate — as a community of practice. Reproducibility/duplication is also a learning/sharing opportunity, it always has been in journalism***.

A good example of that is Makeover Monday, a weekly dataviz competition for the Tableau community. Loads of great learning there but also loads of debate, especially when there are problems with data analysis . It’s something you may not know about if you’re not part of that community but if you’re part of it, it’s invaluable. If we recognise that sharing data and method engages the data journalism community and develops their shared knowledge that’s got to be a healthy thing right?

*I’m conscious that this is becoming a bit of a dissection of this particular piece and I’ll look for some more examples — although my piece on the lack of openness in data journalism speaks to the point. But I picked the piece and so its on me to restate (and its worth repeating over and over again) that just because I think the guardian piece is an exemplar of ‘not good data journalism’ that doesn’t mean I think it’s bad journalism.
*** take a look at Pablo J. Boczkowski’s News at Work: Imitation in an Age of Information Abundance.

When good journalism becomes good data journalism

Making a point

What came Next: capturing some of the conversation…

Subscribe to andydickinson.net