How I read books
December 20, 2009 – 1:47 amTaught this system in about 15 minutes to my cousin Melanie (high school freshman working on her first Big Paper for history; it’s something about People with Swords in Ancient China, so I’m totally reading it when she’s done) this afternoon and realized I’d never written it down, so here goes: this is how I’ve taken reading notes and written papers since I was in high school. I’m also writing this in part to prove that the terminal is useful for things other than writing code, because I did not know how to code when I started doing this.
My system is largely predicated on the assumption that I am a Lazy Bum, and basically involves 4 tools: cat, grep, | (pipes), and flat text files. These are standard Unix tools, and I’ve never seen a Linux distro without them; Melanie and I already run Fedora, so we were all set.
I grab the text of books when possible (mm, Project Gutenberg) and take advantage of the fact that my computer can read faster than I can. For instance, for history my Junior year of high school, I had to write some paper about the Judeo-Christian belief system. I forget the exact topic now, but let’s imagine wanted to grab out some nice quotes about the symbolic use of… say, swords. I like swords. So I download bible.txt, and…
cat king-james-bible.txt| grep -C 1 sword | less
In English, this means “send (concatenate) the text of the bible through a filter (global regular expression print) that looks for the word ‘sword’ and shows the -Context of 1 line before and after it, then let me scroll through the results (less).” The results look something like this.
01:003:024 So he drove out the man; and he placed at the east of the
garden of Eden Cherubims, and a flaming sword which turned
every way, to keep the way of the tree of life.
–01:027:040 And by thy sword shalt thou live, and shalt serve thy brother;
and it shall come to pass when thou shalt have the dominion,
–
stolen away unawares to me, and carried away my daughters, as
captives taken with the sword?–
that two of the sons of Jacob, Simeon and Levi, Dinah’s
brethren, took each man his sword, and came upon the city
boldly, and slew all the males.
And so on. Instant sworditude, much faster than actually reading the whole darn book (or Book, in this case).
For those looking for a more powerful alternative to grep, try ack. (The website URL is pretty accurate.) I was introduced to ack at TOPP and have never looked back; the main advantage is how easy it is to deploy ack on huge trees of folders swarming with text (or code) files, meaning that you could, instead of just looking in the King James Bible, deploy the above search for swords in every note you’ve ever taken on every book you’ve ever read. Assuming those notes are textfiles dumped somewhere underneath the folder you’re searching in, I mean. It’s made fascinating connections between long-ago reads I never would have thought of on my own, and my papers in college were much improved by it.
I also take my reading notes in flat text files as I go through books. Those textfiles look something like this:
Arnold, Bennett. How to Live on 24 Hours a Day. New York: Shambling Gate, 2000. Print.P: (5) Lay out things for tea at night so you can make tea in the morning as a nice wake-up call.
Q: (5) [breakfast] The proper, wise balancing of one’s whole life may depend upon the feasibility of a cup of tea at an unusual hour.
N: Hilarious writing style. Read this book whenever the need for British wit strikes.
?: (7) Was this before or after Taylorism?
N: (7-8) This program would only work in a highly literate population. Which I suppose the reader belongs to, as they’re reading the book. But still.
Note a couple things.
- Full bibliography at the top so I never have to figure out the formatting for it again.
- Each note gets a new line, and begins with a letter code for the type of note it is: P for paraphrasing (summary), Q for quote, ? for a question I have, N for a note (my own thoughts), and some not shown here, like R for “reference to some other material I should look up later” (such as when one book cites another that I figure I should read).
- Optionally, page numbers appear in (parentheses) immediately after the note type.
- Super-optionally, tags appear in [brackets] after the page numbers, mostly when I want to be able to associate a quote with a word that’s not in the quote, for ease of searching later.
Then I can make queries like “what were all the questions I had about this book?”
cat how-to-live-on-24-hours-a-day.txt | grep ?:
Or “what interesting stuff was on page 7?”
cat how-to-live-on-24-hours-a-day.txt | grep (7
And so forth.
Confession: I’ve fallen off the wagon and haven’t taken notes like this since I left school. I’m trying to climb back on it again, as this sort of database is gloriously helpful to build. Particularly if one plans on doing lots of reading and writing of papers. Like, say, if one were to consider grad school.
I’m sure this system could be improved; I once had dreams of writing a GUI for it, but found this worksforme enough that I just never made one. There are probably better tools out there for it, there’s probably a lot of regexp-fu I could pick up to do more powerful queries (in fact, this is one of the reasons why I know regular expressions at all), there’s… well, you know what I’m about to say.
Patches welcome!






10 Responses to “How I read books”
I never noticed the -C flag on grep; cool usage.
I’ve seen many note-taking GUIs, and some are better than others. But I really like the idea of keeping this in the terminal, because it’s fairly trivial to write new and better scripts if you learn better ‘regexp-fu’ or if you need to do a very specific task that some GUI just isn’t suited to. I think this is one of those places where such an interface would actually get in the way.
Either way, very cool system. I might try it out myself. :)
By Matthew Daniels on Dec 20, 2009
You said patches welcome, so… ;)
Your use of “cat” is a useless use of cat. You can just type:
grep -C 1 sword king-james-bible.txt | less
… because grep takes the file(s) to parse as its last arguments.
If you want to recurse down subdirs but don’t want to install ack, you can use “grep -ri”. The -r makes it recurse and -i gives you case insensitivity. ack is great for code because it knows a bunch of code-specific things (like to ignore version control metadata), but if you’re working with non-code text like this, you don’t really need it IMHO.
Other things that might be useful: pdf2txt to turn PDF books into greppable text, so you would do:
pdf2txt something.pdf | grep …
If you have a directory full of PDFs and subdirs and mess like that, you might like to use something like find:
find . -iname “*.pdf” -print0 | xargs -0 pdf2txt | grep -ri …
The -iname makes the name match case insensitive and the -print0 and -0 are explained in this article. (Took me a while to figure that out, so I’m sharing my google results with you to save you the trouble.)
There are probably similar foo2txt converters for most major formats — Word docs, various ebook formats, etc.
I’m not sure how to grep a directory full of a mixture of file formats though. Perhaps better to create text versions of everything for ease of grepping… I dunno.
By Kirrily Robert on Dec 20, 2009
Thanks for the tips, trying to read and write history papers at university is hard enough, you’ve got to be smart about how and what you read and this system gives a new angle.
By Jon Pritchard on Dec 20, 2009
When reading sufficiently academic books you may rejoice in the ingenious innovation known as the ‘index’ =)
My contribution to the art is what I term the Cambridge Method. You read the introduction and conclusion, which in 95% of cases tell you all the useful stuff the book has to offer, and one other random chapter, from which you can draw quotations to demonstrate that you really read the whole thing (hah).
jon: my degree’s in history, so I can personally vouch for the above method in that case ;)
By Adam Williamson on Dec 21, 2009
When reading sufficiently academic books you may rejoice in the ingenious innovation known as the ‘index’ =)
Whoa! It’s like pre-bottled grep! :)
Actually, every time I read the index of a book, I wonder how it was generated – I know the process is almost always manual, and sometimes I disagree with indexes and want to fix them… I guess I wish my books were also view-sourceable all the way down.
My contribution to the art is what I term the Cambridge Method. You read the introduction and conclusion, which in 95% of cases tell you all the useful stuff the book has to offer, and one other random chapter, from which you can draw quotations to demonstrate that you really read the whole thing (hah).
I totally did this in the Other Cambridge back when I was a student too. I never claimed to read the entirety of the books I was given – just that I read enough to be able to draw some good conclusions and write a thoughtful paper on some semi-related topic. ;-)
…which if you think about it, is really what profs ask for; they just assume reading the whole dang book is a dependency, when in fact, it is not.
By Mel on Dec 21, 2009
Then again, I am pretty sure professors know, and probably do this too sometimes. When I become a prof, I want to make sure my students know that this sort of thing isn’t cheating, but instead being clever and smart about how one spends one’s study time.
Sometimes there’s a lot of value in going the long way around, and it’s good to know how to do it. But if you don’t, and you can explain what you did and why, and the tradeoffs you get versus going the long way around, that’s great – after all, the thing I want my students to learn most is how to think.
By Mel on Dec 21, 2009
Not sure how other publishing tools work, but DocBook XML has a nice method for generating the index.
Directly inline in the content, you use a special ‘indexterm’ tag, which has children tags for primary, secondary, tertiary, see, see also, and so forth.
The idea is, as you write something, you make a habit of dropping at least one ‘indexterm’ there. It could even be a default ‘indexme’ so you can find it later to fix, if need be. (I’m sure people go back and do their indexing as a separate step, but I find it helpful to put at least one term for each section, since you think of more index ideas when immersed in the content, but YMMV.)
For one book I wrote, every time I explained what something was, I used a ‘what is’ tag as primary, then the actual thing as secondary: “what is” “security context.”
When I explained how to do something, I followed the same method with a ‘how to’ as primary. Here are the two index entries:
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/selinux-guide/generated-index.html#AEN5596
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/selinux-guide/generated-index.html#AEN6139
By Karsten Wade on Dec 23, 2009
pdftotext is having major trouble with journal articles from JSTOR. It keeps getting stuck at the end of the first page, which JSTOR inserts. I open up the text file and there’s a string of unprintable characters, the ones in little boxes. I can’t skip the problem by setting the page to encode from either, shame.
By Jon Pritchard on Jan 5, 2010