A cool idea that failed: you can’t reverse-engineer a paper for open access

May 3, 2012 – 8:42 pm

One of the things I tried out as part of my independent study on open access this semester was the idea of reverse-engineering a publication. This isn’t about hacking code; it’s about hacking copyright. And as it turns out, it doesn’t work.

Here’s the setup: imagine you’re a researcher and you’ve written a great paper that’s published in a prestigious journal. You beam with pride! Life is fantastic. And then you find out about the open access citation advantage, realize your publisher allows archiving of preprints, and think that life is about to get even better.

There’s just one problem. You can’t find your preprint version (the final edited version you send to the publisher, usually a plain Word or LaTeX document). You only have the final copy PDF with all the branding and pretty-print formatting on it – the version that got published in the journal. Somehow, in the frenzy of hard drive clean-up that accompanied your “I am done with this paper forever!” project completion celebration, you… you lost the file.

But wait… the final print version is identical to the text you sent in, right? All the publisher did was add formatting. So if you could just grab the text from the final print version and throw it back into a Word document, that would be identical to the preprint, and you could post that. A preprint is just the end publisher content there without the end publisher formatting. Right?

Wrong. The problem here isn’t technical, it’s legal. I actually took a print pdf and “reverse engineered” it into a LibreOffice document, and it looked fantastic — I did the process by hand, but it would be easily automatable, so the software portion of the problem is trivial. I talked with Donna Ferullo, Purdue’s copyright librarian, and the copyright portion of the problem is, unfortunately, a blocker bug. The crux of it the matter is that we don’t know what value the publisher added before printing. Okay, this probably is “not much other than formatting,” but still… it’s legal grey. So we hit a hard wall on that, but at least we learned something.

I promised to write something up about this since I don’t think the reverse-engineering idea has been broached before, and it’s at least good for others to know that it’s a dead-end — so here it is.

Know someone who'd appreciate this post?
  • Print
  • Facebook
  • Twitter
  • Google Bookmarks
  • email
  • Identi.ca
  1. 4 Responses to “A cool idea that failed: you can’t reverse-engineer a paper for open access”

  2. It has always surprised me that people don’t publish the LaTeX (or whatever) source code for their papers at the same time as the formatted PDF. You’d think it would be standard practice to include a ‘download the source to this article’ link at the bottom; if nothing else, that helps other poor schmucks who are trying to get LaTeX to format things the way they want.

    By Ed Avis on May 4, 2012

  3. …Ed, that’s a brilliant idea. If someone will help me make a LaTeX template with a slot for that URL at the bottom, I’ll start using it for my papers from here on out.

    One of the frustrating things about this semester is that I had classes where the professor required us to turn in our papers in Word format because that’s how they wanted to comment on them. I would have much preferred to relearn LaTeX. Is there an easy way to convert LaTeX to Word?

    By Mel on May 4, 2012

  4. One question. If you sent the pre-print (word or LaTeX document) via email, do you still have that email in your sent mails folder? If not, then for future reference, I’d suggest archiving that folder.

    Have a great weekend:)
    Patrick.

    By Patrick Dickey on May 4, 2012

  5. I’ve been strongly pushing the idea of using source control for non-code things to my writer friends. Github – not just for code!

    By Sharon on May 5, 2012

What do you think?