Project Puppy: first public transcript, and how I’m thinking of explaining this process to IRB

March 5, 2012 – 10:58 am

A Project Puppy update is long overdue: our first public data was released last week after working through privacy/licensing/permission issues, medications and that journey warrants its own storytelling down the line — but right now I’m exhausted enough that I’m just going to put it out there in all its unexplained messy glory, this web and profusely thank Jon Stolk for his extraordinary courage in offering to be the first person to go through this. I know I’m putting out something that begs more questions than answers, so… please ask them, because we’d love to answer.

So, that journey of making the data transparent… I’m trying to think about the best way to explain it. I think what I shall do is to try and reverse-engineer it into the clear steps I’m attempting to apply into my own small “radical transparency research” project (more on this as it comes up; I’ve mostly not had time to truly start it). The main purpose of this second experiment, which I will call “Project Kitten” for amusement purposes, is to take more time to experiment with and really clarify and walk through the copyright, publishing, and ethics/IRB concerns of doing radically transparent research.

Here’s the raw crazy idea, which I expect to have all sorts of things wrong with it that I’ll find out by doing.

If there are cultures that operate with radical transparency, and individuals within those cultures consent to (or even request for) a completely open research process… why can’t I, as a researcher, abide by that request? If I…

  1. Got permission to interview someone and use their anonymized, locked-in-a-vault transcript in full for research (basically, “normal research procedure,” and then
  2. interviewed them, we’re still in the middle of a normal research procedure. Okay. Let’s keep going.
  3. Next, I’d assign copyright of the resulting transcript to them, while maintaining “normal research rights” to use the “anon/locked” versions of the data in the stuff I’m working on for publication. This is a bit unorthodox, but outwardly not a huge deal; if the interviewee decides to stop here, we’ve still done things the “normal research way,” it’s just that transcript copyright ownership is actually clear (as opposed to rather fuzzy, which seems to be the case for most research interview transcripts).
  4. Now, if the interviewee wants to go farther… things start looking weird. As the copyright holder, the interviewee now has the choice to release their interview data under an open license. They can choose to release only a subset of it, they can choose to edit it before release, even to the point of anonymizing it to their satisfaction before putting it out there… it’s their call. (All sorts of questions and alarm bells should be going off now for any researchers who’ve read this far. What if it’s anonymized but someone guesses correctly? How can we be sure that these people really understand what they’re getting into?)
  5. Okay. Still with me? We now have an unambiguously public fork of the data, complete with paper trail, that can be used by anyone for anything (after the relevant IRB deems it a public/exempt data set): by interviewees for project marketing (ding ding concern bell!), by researchers (original team or not) who can then perform their coding/analysis/etc in public, etc. The latter is what I am hoping for; I want to “open up” the black box of how “engineering education research is done,” to let people listen in on conversations of this type and see what it looks like to think that way. Discourse exposure. Interesting conversations.
  6. And to throw even more interestingness into the mix: the original research team has the original private version of the data (which may be anywhere from “bitwise identical” or “only vaguely reminiscent” of the public version, and may include things completely excluded from the public dataset). Now. Research group: what can you do with that mix of information?

All sorts of really interesting deep chewy issues here, like how this process affects what people say, questions of coercion, what risk prediction and mitigation looks like in this case, how to make sure the public stuff doesn’t identify the anonymized stuff if an interviewee doesn’t want it to, etc. Fun to consider, interesting to work through, probably a painful set of IRB convos… but I’ve got time, and I am learning to be patient.

I’m talking with my advisor and slowly (very slowly) moving through talking with Purdue’s IRB as well as with other researchers who’ve looked at open communities; this blog post is in part a way for me to have something to point them to when I email them (hi!) This is a naive statement, but it looks to me right now like most of the other researchers who’ve done this have followed standard procedure — anonymized and locked-down data, working how the IRB expects researchers to work. The rationale I sense is “because I didn’t realize there might be a different possibility for this earlier on, and don’t have time to look into it now because I want to finish my thesis,” but I could be wrong — I hope I am wrong, very wrong, and wish to be corrected! But that is the reason I’m hurtling into this process first year, when it’s a side project that does me no harm if it fails.

This post is poorly structured, poorly written, and nowhere near as well-explained as I would like. But: release early, release often. Hopefully someone, someday, can help me turn this into a clearer writeup. But for now, I think this is somewhat expected; we’re exploring unfamiliar terrain, and this sort of terrain is usually shrouded in fog.

Welcome to the fog.

Know someone who'd appreciate this post?
  • Print
  • Facebook
  • Twitter
  • Google Bookmarks
  • email
  • Identi.ca
  1. 4 Responses to “Project Puppy: first public transcript, and how I’m thinking of explaining this process to IRB”

  2. There are various alignments you could approach this with: You could propose something that is arguably *right* but totally against the rules (chaotic good), or something that technically abides by the rules but subverts them to your own purposes (lawful neutral). I expect that what you’re reaching for is more like the former — so that the rules end up being changed for the better — but that you’ll be guided to do the latter in order to increase your chances of success. :)

    I love the interviewee-holds-copyright-and-releases-their-own-data idea, even though this strikes me as totally lawful-neutral since it basically depends on using separate rulesets (IRB vs copyright) so that neither ruleset has to change. I also like that people are worried about informed consent in this case. You can’t stuff data back in a box once it’s out, and it’s difficult to get anyone to think sufficiently pessimistically to predict their own behavior once something unexpected happens…or even once something you told them would happen, actually happens. We tried so hard to do everything right with consent forms and guided discussion of risks for image publishing in a project on kids last year, but we still had people revising their permission at the last minute. I’m really glad we called folks to be sure before actually putting their photos online, because that could’ve been a horrible mess.

    I wonder what would happen if you had some way to quantify how anonymize-able someone’s data was? To go by steps and say, only release my data if I am one of 10, or one of 100, or one of 10,000 indistinguishable data points or personae. What’s the maximum degree to which an interview transcript can be anonymized? Is there a certain point past which the data are no longer useful for research?

    Is there any relevant research into how people make decisions like these, where the degree of risk is difficult to grasp, and may change over time? Perhaps this could be leveraged to help increase the likelihood that subjects make a decision they’re happy with, and continue to be happy with in the fullness of time.

    By Katie Rivard on Mar 5, 2012

  3. > http://whois.arin.net/rest/ip/128.2.208.72 Comment: There are
    > various alignments you could approach this with: You could propose
    > something that is arguably *right* but totally against the rules
    > (chaotic good), or something that technically abides by the rules but
    > subverts them to your own purposes (lawful neutral).

    …this is AWESOME, and I think I need to put “chaotic good” somewhere
    on my (long-stalled) portrait of Future Dr. Mel (currently vaulting a
    rolling luggage while swinging a lightsaber while graduation robes swirl
    dramatically).

    > I expect that what you’re reaching for is more like the former — so
    > that the rules end up being changed for the better — but that you’ll
    > be guided to do the latter in order to increase your chances of
    > success. :)

    Yeah, probably — we’ll find out, I hit IRB office hours tomorrow for
    the first time! But the entire point is that I don’t need to succeed
    with this right now; by doing this on a semi-throwaway project my first
    year, I literally have nothing to lose.

    > I love the interviewee-holds-copyright-and-releases-their-own-data
    > idea, even though this strikes me as totally lawful-neutral since it
    > basically depends on using separate rulesets (IRB vs copyright) so
    > that neither ruleset has to change.

    This is also my favorite part of the idea, though it makes things really weird because you then have to triangulate between the two kinds of rules and datasets, especially if the public data is significantly different than the private version.

    I also think it’s only going to work on ridiculously low-risk subjects. In my case this first round, my subject pool is made of CS faculty, so not only are they not in a vulnerable position, I really have no power over them (as a grad student) and they’re already overinformed about data privacy, IRB, and that sort of thing… and that’s a deliberate strategy choice here because I wanted to allay that fear for my IRB the first time. Really, the entire study is an excuse to run this crazy idea past IRB.

    > I also like that people are worried about informed consent in this
    > case. You can’t stuff data back in a box once it’s out, and it’s
    > difficult to get anyone to think sufficiently pessimistically to
    > predict their own behavior once something unexpected happens…or
    > even once something you told them would happen, actually happens.

    Yeah, this is one of the reasons I was surprised that Jon agreed, and why I was so glad that it was Jon — because I know that he’ll work well with us to react to anything that comes up.

    > I wonder what would happen if you had some way to quantify how
    > anonymize-able someone’s data was? To go by steps and say, only
    > release my data if I am one of 10, or one of 100, or one of 10,000
    > indistinguishable data points or personae. What’s the maximum degree
    > to which an interview transcript can be anonymized? Is there a
    > certain point past which the data are no longer useful for research?

    Ooh, I… will have to take that into IRB office hours tomorrow if they end up getting concerned. I think there could be a Kickstarter-like thing for this — “the data will only be released once we get X participants, so if you want to see the analysis and results, get your friends to join in!”

    > Is there any relevant research into how people make decisions like
    > these, where the degree of risk is difficult to grasp, and may
    > change over time? Perhaps this could be leveraged to help increase
    > the likelihood that subjects make a decision they’re happy with, and
    > continue to be happy with in the fullness of time.

    Hrm. Not that I know of, but I can ask the psych department. One of the joys of having a big research university around you is that you can go a couple buildings down and ask crazy questions in any discipline, and hit Real Answers. It’s good to live in a land of geeks!

    By Mel on Mar 7, 2012

  1. 2 Trackback(s)

  2. Apr 15, 2012: Mel Chua » Blog Archive » How to assign copyright to your interviewees
  3. May 3, 2012: Mel Chua » Blog Archive » Project Puppy is… not human subjects research? and: the true nature of Project Puppy begins to be revealed.

What do you think?