Thursday, November 27, 2008

Indices in Ebooks

I've been looking through a number of ebooks recently, and one of the things I've been examining has been each book's index. Assuming the book has an index. Some don't. Which raises the question: given full-text search (e.g., via book-viewing software or desktop search tools), do indices continue to be useful? Or are they an artifact of print technology that makes little sense in an electronic environment?

I'm committed to producing indices for my books, because one of the output media I target is print. Indices in print are of proven value, so I'm on the hook for them regardless. But I think they make sense even for electronic-only publication. The reason is that good indices reference not just text, but concepts. Fastware!, for example, is about how to write software that runs quickly, but there are lots of words that correspond to that idea: speed, efficiency, performance, scalability, responsiveness, latency, etc. If you're interested in reducing memory latency, and I happen to say something important in a passage where I'm discussing how to improve the hit rate of the instruction cache, you want the index entry under "latency" to point you to that passage (or, more accurately, I want the index entry under "latency" to point you to that passage), even if I somehow never manage to use the word "latency" in the passage.

Many technical books have awful indices, a situation I attribute to the facts that (1) most indices are prepared by professional indexers, who typically have no understanding of the book's content -- they literally don't know what many of the nouns and verbs mean; (2) these indexers are paid unbelievably badly (typically only a few hundred dollars to index technical books of up to a thousand pages), so they have little incentive to do more than a cursory job; and (3) the quality of a book's index doesn't seem to affect sales, so there is no economic incentive to change the situation. My guess is that as ebooks become more common, indices will fall by the wayside, because it will be easy for authors and publishers to reason that textual searches obviate the need for separate indices, and given the sorry state of most indices, this will probably be true. I'm an old-fashioned guy, however, and I think that a good index improves the usability of a book, and I also believe that the whole point of a book is to serve the interests of its readers, so for the foreseeable future, I plan to produce indices for my books, even though index preparation is, to be honest, probably the single most unpleasant part of writing a book.

So I'm going to produce an index for Fastware!, and that takes me back to page numbers. In an earlier blog entry I worried about the problem of referring to page numbers in a book that may be published in multiple forms, hence have multiple sets of page numbers. Two people commented that the solution is simple: refer to something like section numbers or paragraph numbers instead of page numbers. This is clearly the correct approach, but think of what this means for an index. A single index entry often corresponds to multiple locations in the book, which is traditionally represented as a list of page numbers. If page numbers go away, and if we assume that locations in a book are represented in the form c.p, where c is the chapter number and p is the paragraph number within that chapter, we end with index entries that might look like this:
  containers, standard
C++ 4.3, 4.55, 5.18-22, 7.23
C# 4.4, 4.80-99, 7.65
Java 4.3, 4.60, 5.22-25
It looks a bit odd to me, but I can't think of a reason why there is anything wrong with it. Can you?

Of course, we could also use the form c:p, which would make references look somewhat biblical, hence possibly enhancing their appearance of authority :-)


Rob J Hyndman said...

I agree that indexing is required as well as searching for ebooks. One solution is to integrate the two: add index tags where required and have these entries appear at the top of the list when searching for that term.

George Reilly said...

The indexes in Knuth's books have always been a work of art in their own right.