Saturday, December 27, 2008

Notes from Effective C++ CD

During 1998, much of my time was devoted to designing and helping supervise the implementation of an electronic version of two of my books. The result was Effective C++ CD, an HTML implementation of the books and some associated magazine articles, where links connected everything on the CD. In our work, we addressed some content issues and many presentation issues, and we produced enough innovations to merit an article for practitioners in Microsoft Internet Developer and a paper for academics in Proceedings of the 5th Conference on Human Factors & the Web.

I recently found myself in the directory with the CD's files, and I took the opportunity to review the notes I'd made regarding how we could improve the CD were we to undertake the project again (e.g., as a second edition). Most of the comments were specific either to the content of the CD or to the web-browser-as-book-viewer decisions we made (hence not germane to my work on Fastware!), but some remain relevant today. Here they are (in no particular order):
  • Text in graphics should be visible to search engines. (This generalizes: text in figures, tables, diagrams, animations, etc., should be visible to search engines.)

  • Electronic versions of books are essentially software and, like contemporary software, they should be updatable via the net. When bugs are fixed in an electronic version of a book, owners of that book have a right to expect to be able to incorporate those bug fixes into the book they've purchased.

  • The books on my CD are organized into either 35 or 50 "Items," which are essentially technical essays. Each book has an extensive set of cross references among its Items, so Item 22 might refer to Item 21 and Item 8. Within a book, this was fine, but when I added links between the books' Items (a feature exclusive to the CD), I had to make clear which book had the Item I was referring to. That is "Item 22" was unambiguous within a print book, but it wasn't unambiguous on a CD with two books, each of which had such an Item. I addressed this by prepending a book-specific letter to the Item number for Items outside the current book, e.g., "Item 22" is in the current book, but "Item M22" or "Item E22" is in the other book.

    A fair number of people found this confusing. One way to address this problem would have been to always use the E or M prefix on all Item cross references, but this would have introduced syntactic noise and led to the electronic books not looking the same as their print versions.

    The real problem, I think, is trying to figure out how to write for something that might stand alone (as, e.g., a print book) but that might also be part of a collection of interlinked documents. Readers of the standalone version shouldn't be bothered with cross reference disambiguation overhead that's needed only in non-standalone environments, but books they know from their standalone versions should look essentially the same as the versions they encounter in linked environments. (For a related discussion, consult my vision for electronic books.)

  • Because link text generally looks different from non-link text, it calls attention to itself. That's the point of it looking different: to communicate, "Hey, I'm clickable!" Unfortunately, when too much text tries to get your attention at the same time, it intrudes on the reading experience. For example, contrast this, where every reference to Amazon is linked,
    You can buy lots of stuff at Amazon. That's because Amazon sells lots of stuff. Amazon customers expect lots of stuff at Amazon, because that's what Amazon is know for. Yay, Amazon!
    with this, where only the first reference is linked:
    You can buy lots of stuff at Amazon. That's because Amazon sells lots of stuff. Amazon customers expect lots of stuff at Amazon, because that's what Amazon is know for. Yay, Amazon!
    Failing to make links out of text that's already been made a link recently is, to me, akin to using pronouns to refer to nouns that have been recently introduced. Pronouns make text more interesting and less repetitive, but harder to understand out of context. Non-link text is similar: it avoids visual repetition, but it makes the text harder to understand out of context.

    There are two additional issues relating to whether repeated text should be made active at each point of repetition. The first has to do with consistency. More than one reader of my CD complained that I was inconsistent about what text was linked and what was not. These readers seemed to expect all naturally linkable text to be linked, no matter how many times that text occurred, even within a short space.

    The other additional issue concerns search engines, which can plop you down in an arbitrary location in an arbitrary document. If you start reading and you encounter a pronoun, you naturally scan backwards looking for the antecedent. But if you encounter text that seems like it should be a link, my guess is that you don't scan backwards looking for the same text in link form. Rather, you get annoyed at the author for failing to make the text you're looking at a link. That leads to the challenge: how do you avoid the visual clutter that accompanies making every occurrence of naturally linkable text into a link while also meeting the link-related expectations of readers who use search engines to take them to the point in a document where they start reading?

  • Clicking on naturally linkable text like "Item 5" or "Section 3.5.1" or "Chapter 4," where the target of the link generally has a title, leads to some readers wanting to see the title without having to traverse the link. Instead of
    As you'll see in Chapter 4, ...
    they'd prefer to see:
    As you'll see in Chapter 4 ("Giant Anteaters"), ...
    Some authors do this in print, but I find it distracting as a reader. As the author of an electronic book, I can offer the title as an option by, e.g., displaying it when the mouse hovers over the link text. But that means I have to make sure that capability is provided when my book is prepared for electronic publication. (A similar capability can be used to avoid making readers turn to a glossary to see a term's definition.)

  • If I'm looking at an nth-level index entry, it would be nice to have an easy way to get to the n-1st level, i.e., essentially a way to move to the parent entry for a child index entry.

  • One reader wrote:
    It would be great if you can have a table of content showing in the navigation area, and there is a toc synchronization function (much like Microsoft Workshops), so that the readers will have a better idea of where they are in the book.
    This is one way to address the "where am I?" problem that can arise as the result of a search or when following a link from one part of a document to another (or from one document to another).

No comments: