Creating gentle introductions to coding for journalists...

I'm currently terrorising my MA students with some code. I decided to look at a bit of Python, as one of the common languages that pops up in data journalism circles.   There's a couple of things I wanted to try and reflect in the process

  • Avoid where possible, the debates - Should journalists learn to code? Anyone?
  • Avoid where possible too much jargon - Is this actually coding or programming or just html
  • Avoid the issue of installing development environments - "We'll do an easy intro but, first lets install R/python/homebrew/jupyter/anaconda...etc.etc."
  • Not put people off - fingers crossed

With the first two points in mind, the first thing I wanted to do was find a hook that they were already familiar with and I settled on this.

=median(A1:A4)

The students had already been looking at data journalism, and some basic spreadsheet formulas in particular.  The reason to start here was to make the point that they had already done some coding. They'd used a function, with variables and attributes and seen the results its generated.  

One of the Google sheets formulas we had used is a great example of this

=IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",4)

My point being that, in principle at least, theres not much difference between this formula and a python command like:

pd.read_html("https://en.wikipedia.org/wiki/List_of_mountains_of_the_British_Isles_by_height", header=0)

In a similar way, I've picked on a bit of HTML, something they'd already had a small look at,  to discuss some basic principles around code in general.  

<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">
<h2> This is a heading </h2>
<p class="bg-danger text-white> This is a paragraph of text.</p><p> This is a second paragraph of text </p

Looking at HTML lets you chat about structure and well-formed code.  Looking at some basic CSS code means you can start to talk about libraries and how we can 'call' code from other places. The aim is that it takes on a journey to that makes something like the Python code below is easier to unpick

import pandas as pd
tables = pd.read_html("https://en.wikipedia.org/wiki/List_of_mountains_of_the_British_Isles_by_height", header=0)
mountains = tables[1]
mountains.to_csv("mountains.csv", index=False)

It's not that any of this is hard. Or that in principle, starting with the code isn't equally as appropriate - before anyone points it  out, I'm not saying that starting with the basics of programming is wrong.  But I didn't feel I was making connections that were too tenuous, and it had the benefit of connecting some dots - like the connection between HTML tags and web-scraping.

This approach also fits, I think, with the journey many journalists make as they venture 'under the hood' of the web.  Spreadsheets are not new in the newsroom and many journalists will have had some hands on in one form or another. Thanks to the limitations of a CMS,  journalists will tweaked HTML e.g. adding or removing nofollow from links or pasting in embed code and tweaking the width and height. Many build on these experiences and the confidence you get from seeing stuff 'work' to look for the next steps.

Using online notebooks

The equation Tech + Journalists= is one you don't need any coding experience to solve. The answer is stress.  

Experience has taught me that as soon as you add tech to the mix, you can guarantee that one person will have a screen that looks different or an app that doesn't work.  Things get more complicated when you want people to play and experiment  beyond the classroom. Apps that don't install; or draconian security permissions are only the start.  Some of this stuff is quite hardcore for a user who's never used notepad before let alone fired up the command prompt.  All of this can be the hurdle that most people fall at. It can sap your motivation.

Thankfully there are a growing number of accessible packages that make this stuff more consistent and replicable.  I'm a big fan of the Anaconda platform - Yes, its a bit over-specced for your average data journalism project, but it does the heavy lifting in terms of installation. Good as it is though, it still needs to be installed.  I didn't want that to get in the way of working with the code, so I looked at online options.

There are a number of Python shells kicking around. These essentially recreate a command line - the text interface you'd have if you were working on your own machine, so you can put in snippets of Python. Python.org have a good one

Python.org have a nice Python Shell that you can try simple code in. Try adding print("Hello world")

These are great but only really work for short snippets of code. I wanted to be able to have more control and look at the format too.  So I turned to notebooks.

Notebooks are designed to encourage 'literate programming' - a process where you mix code and text that describe what the code does.  The most common LP tool is Jupyter which has gained a huge amount of traction in data science and data journalism.  

A typical Jupyter notebook allows you to write and run code and add comments to the same document. 

Anaconda, mentioned above, is an easy way to install Jupyter on your own machine but there are several online tools that do the same thing

There are pros and cons for each, but in the end I went for Google Colaboratory because I knew the class had access to Google accounts.  That said, in future I think I would go for Azure. The interface is more like the standard Jupyter interface which would help with the hand-off to the 'real thing' when people wanted to move on. But for now, the principles still hold and it's enough to get people engaged.

Conclusions

There are always going to be snags and ,by the time we get to importing libs like pandas, things are going to get complicated - it's unavoidable.  But if the students come away knowing that code isn't tricky at least in principle,  that at a low level the basic structures and ideas are pretty simple and there's plenty of support out there. Well, that'll be a win. Fingers crossed.

Afterward

(20/1): Paul Bradshaw reminded me that there is a space between online shells and notebooks.

I really like and would heartily recommend  Repl.it, not least because it does more than just Python (sure, Azure Notebooks does R and Python). You can do R, Python and also there's Django too. Django is a python framework that has a very strong foothold in a lot of US newsrooms and tech-j circles in the US.