This blog post started as an explanation of my personal research process for my Hacker School Book research team. We were talking about the various ways people take first-pass, rough notes on transcripts, and I offered mine as an example. Tiago Forin and I co-created this specific technique variant for our DTRS analysis, but we're pretty sure we're not the only ones who've reinvented this particular wheel.

The picture shows an interview transcript with a bunch of marker scribbles on it. Basically, it's a way of marking codes ("themes") in text so I can see big patterns and go back for finer-grained analysis and checking later. For instance, in an interview with furniture designers, I might want to mark all the times someone is talking about how important shapes are in furniture design. So every time I see that code occurring in the data, I write a short word for that code ("shape") right on the data, and draw an arrow through all the text I want to encapsulate with it.


a text transcript annotated with colored markers


Important disclaimer: the document pictured (including the transcript) is entirely open-licensed, so the picture can be shared far and wide. However, to create this example, I picked random codesets (that don't really apply to the data) and I randomly scribbled those codesets across the page with no particular rhyme or reason, so don't try to actually read the text and figure out how in the world this sentence is an instance of "4th wall" or "model" or whatever -- or even what those codes might mean -- because these codes do not correspond in any way to the transcript!

When I have multiple codesets, I color-code the codesets. For instance, I might have a green codeset for "everything related to how the furniture design looks," like "shape" or "form." I might have a blue codeset for "acting techniques they use when presenting their work," like "breaking the 4th wall" or "monologue" or "dramatic pauses." I might have a red codeset for "pedagogical techniques" like "modeling behavior" or "coaching the audience through a process." This lets me see where codes overlap/co-occur; for instance, does "coaching" often happen when people talk about "form"?

This also makes it super-easy to collaboratively first-pass code with someone, since we'll just split up marker colors. I might take green and pay attention to shape/form as we're going through the transcript, while you take the red and watch for pedagogical techniques. We can do this sort of coding simultaneously, discussing the transcript while we both scribble on it -- or asynchronously, where I take the data to my desk and mark up all the shape/form codes in green, then hand it to you to do the pedagogy pass in red. We end up with a useful boundary artifact for discussion, which helps us do a more detailed analysis pass with better precision and sophistication. Eventually, we load the codes into a computer for even more analysis.

But that is all later -- much later. This is my first quick-and-dirty step. It's me, maybe a colleague or two, a bunch of colored markers, and a table strewn with printouts, reading quickly through these things and marking them. I can get through a 25-page transcript in less than 15 minutes if I'm only marking for a small codeset, and I've read the transcript before.

So if you're in a research project that I work on, this is probably what's happening to your transcript at some point! And if you're working on a research project with me, you will probably be handed packages of children's art supplies at some point. It is fun!