The GIANT CAP approach to coding for journalists.
Coding for journalists is contentious topic. Despite what anyone tells you, if you’ve never touched a programming language, its not straightforward. I’m not a coder/programmer/hacker at all; I like to tinker. But like many, I sit down with the best intentions — I’m going to learn x or y properly. I sit down with the tutorials and work my way through the first few until I get board or can’t get stuff to work like the tutorial and give up.
The best results I’ve had are when I’ve had a problem. Something I know code can do but no idea how. Knowing what it is I want is enough to get me started. I can’t honestly say I’ve ‘learned to code’ but I’ve learned enough to get the job done.
As a result I’ve settled into a methodology for working with code that works for me and I think also might work if you’re a journalist and want to try coding. It’s called GIANT CAP.
In a nutshell its this:
Google It And Then Cut And Paste
Breaking down the method in more detail, there are six main things to consider.
Pseudo code is an approach to describing what you want to achieve in a semi-structured way. Lets say I have a million rows of spending data from government and I want to identify which rows correspond to the NHS and work out an average spend. Here’s a very basic example of some pseudo code:
1. Load the data file from a folder 2. Tell me how many rows there are 3. Look rows that match a value NHS 4. Tell me which rows they are. 5. Work out the average value from those rows 6. Save those to a new spreadsheet.
Breaking down the job this way helps identify the block of code you’ll need. Many programming languages are designed with an eye on mirroring ‘normal language’ so in principle it shouldn’t be a huge leap to begin to make it look more like code as we progress. At this point though, **try and look at one task per line but try not to be too specific. **As your experience grows you can even start to throw in more code like ideas.
Here’s an example of that from something I kludged together the other day. I wanted to work out how many rows I needed to show 60 images per row based on a number of images in a folder.**
# Count how many files are in a folder. num_files # Work out how many minutes that is. minutes = num_files/60 # Round that number up so we get full minutes
At the end of each line I’ve added a variable like* num_files* or a bit of ‘maths’ to work something out like minutes = num_files/60. That last equation worked directly in python — it was actual code!
This is the bit where you need to take a deep breath and dive in. Things can get quite hard-tech quickly. But if it looks daunting, my advice is ‘don’t step away too quickly’. Scan through pages and tutorials, even if you don’t understand them completely. You’ll be surprised how quickly you can join the dots just by immersing yourself in the language.
One site that pops up a lot and I guarantee will become a regular haunt, is Stack Overflow. It’s the best and worst site on the web for coding advice. There are loads of examples and lots of advice. But again, be prepared to grit your teeth and wade through some stuff that might not immediately make sense. The key thing is there is lots of code to cut and paste.
You’ll also come across sites that seem to cover loads of what you need — those sites that people lovingly curate over time. One I’ve found really handy is data scientist Chris Albon’s site. It has loads of great tutorials including working with Python from the basics to more data journalism friendly things like PANDAS.
Once you’ve found some examples and code that makes sense, try it! Yes, this method assumes you already have some coding environment set up. But hey, this is a method not a tutorial! For the record I’ve been enjoying playing with Python and I found Anaconda a really great system. It installs python and other things you’ll here coding journalists talk about like R and Jupyter.
Trying code out in blocks is not only a good way to learn by doing, it’s also a good way to build a library of code to use. Taking our example above, having a working block of code that takes a file from a folder or filters some data, is something we can use again and again.
Cutting and pasting code is guaranteed to throw up errors. Most commonly;
If there’s one thing that programming languages do well its give you errors. They are also really bad at telling you what they mean. But most will have some things in common:
Here’s an example from Python:
File "testcode.py", line 112, in <module> print ("We've got "+len(npath)+" frames to work with ") TypeError: must be str, not int
It tells us that at line 112 something that should be str is an int! Copying the error and putting it in google gives us plenty to go at. Again, it is helpful to stick the name of the language at the end too. e.g “TypeError: must be str, not int python”. The problem here is I’m trying to do something with a number (int) when python was expecting text (str).
If you find a solution to a problem, save the link. Preferably by adding it to your code as a comment. That way you’re not going to forget where the advice came from and if you need a reminder its easier to find it. It’s also important when it comes to blocks of code that you may cut and paste. You should always cite your sources — sometimes you’re required to by a licence.
It’s likely that you’re not going to be programming everyday. So something that reminds you what and why you did stuff is important. I’ll also comment on development as it happens so I know what I tried to get it to work. e.g. leave a comment to describe what’s happening at a particular point in the code. This often makes code more cluttered than more experienced coders would like but it means I’m not looking across notebooks and other documents. All the process is in the code.
There’s a debate in coding circles about comments and the general feeling is that you should keep comments to a minimum — the code should be descriptive enough. I can see the sense in that, especially if you’re working with others. Comments might mean something to you but might confuse others. But in GIANTCAP you’re commenting for you so be as cryptic or wordy as you like.
The GIANT CAP method won’t help you be a better coder. Some might argue the cut-and-paste bit might make matters worse through bloated code etc. But I do think it will get you on the way — it has for me.
The code I write is never pretty. It isn’t efficient. It is often slow and very rarely works first time. I don’t think I’d show some of it to anyone let alone a coder. But it works. Eventually.
I would guess that for most journalists, that’s enough. Code just needs to do a job, deliver a result we recognize and can work with. Then we can move on. Maybe a few days or months later you’ll need to do it again, or something similar, and it will all be there waiting to remind you. All the info about the way you kludged your way through last time will make the next time a bit easier and maybe remind you how much you enjoyed it when it finally worked.
If you want to see an example of some code written using the GIANT CAP method take a look at some code I wrote to sample a video file.
** You’ll notice it starts with the hash symbol, that’s because I wrote the code in Python and # is how you start a comment.
Image courtesy of camknows on Flickr