Data Thursday I
Data Thursdays is a 3-part lecture series given at Columbia University to the students at the Graduate School of Journalism, by Mark Hansen.
I was prepared for a 3 hour lecture about excel sheets and opendata.gov and FOIA requests. Instead Mark Hansen, the data guy at Columbia, gave the most stimulating talk I’ve heard in a while — about the world of data that we live in.
1. Data is a model
One of Hansen’s slides had this formula “digital technology = model of world = argument”. Every technology we use is operating on a model of the world and then “acts as a relentless argument for that model”.
Google for example digitized the world in one way, and has been training us to use its model for a decade (whether in good faith or not is irrelevant). As citizens, consumers, users, we often accept technology as is, but as journalists, we have to question the model. How does it work this way? Why does it work this way? What assumptions are being made? What sacrifices are being made in order to accommodate this model?
2. Algorithms/Data are not objective
If technologies and algorithms exist as models, not facts, then they are proxies created by humans, and it therefore follows that they are subject to all the human error and bias of their creators. While working on a project for the 9/11 museum, he was asked to create a display using an algorithm, so that it would be objective. “An algorithm is not objective,” Hansen said emphatically. “It may be systematic, but it’s not objective.”
3. Creativity is where you “turn the world to bits”
In the lecture, Hansen went through many things (material and conceptual) that we encounter in everyday life and asked how we would turn them into data – time, money, a bodega, a movie. It was to make us begin to think about that space where a house on a city street becomes a value that can be added and subtracted — where the world becomes ‘bits’. That’s where the creativity is. And probably that’s where the danger is. How reductive are we willing to be?
4. Data is an exchange
Data means given, as opposed to capta, which means taken. If data is something you’re given, it is not a one-way transaction. You have a responsibility to the givers – and that doesn’t necessarily mean the compilers, it could mean the values that make up the data – the families in the census, the users who click on this or that ad etc.
Doesn’t this change the conversation?
tl;dr The job of a data journalist is to question the algorithm. Question the model. (Not to drown in excel sheet.)