Sunday, April 12, 2009

TOC Talk Now Available Online

The talk I gave at the Tools of Change in Publishing Conference in February is now available online. It runs about 46 minutes. If you've been reading this blog, you won't get anything new from the talk, but if you're interested in the "live" presentation, you can find it here.

Wednesday, March 18, 2009

Jakob Nielsen Post on Kindle-Friendly Content

The TOC Blog pointed me to Jakob Nielsen's recent post, "Kindle Content Design." Nielsen raises a number of interesting issues, and in general I think we're in violent agreement, but his post ends with this:
Since I started writing Alertbox in 1995, it's been a recurring theme to design for the medium. In the beginning, this meant "don't design your website like a glossy brochure." (I.e., print design is different than online design.)

For Kindle, it's certainly unacceptable to simply repurpose print content. But you can't repurpose website content, either. For good Kindle usability, you have to design for the Kindle. Write Kindle-specific headlines and create Kindle-specific article structures.

I fully agree with the part about designing for the medium, but I don't think Kindle is a medium in and of itself. There are a number of other dedicated book-reading devices, and my guess is that they are similar enough to one another that it's possible to design for the entire category. As an author, I don't want to try to create content on a device-by-device basis, I want to do it on a device-category-by-device-category basis. That may not yield optimal content presentation for every device, but it should yield acceptable presentation for devices I know about as well as devices I don't know about (e.g., that are introduced after I've finished writing). Yes, my software development roots keep showing: I approach cross-platform content authoring the same way I approach cross-platform software authoring.

Tuesday, March 3, 2009

Saving Myself from Physical Formatting

I've been reviewing my FrameMaker source files for Fastware!, looking out for single-sourcing no-nos. High on the list is the use of physical styles instead of logical ones, the poster child for which is the use of italics to style text instead of a logical style like "Book Title." The problem is that the use of italics can be for emphasis, for the title of a publication, for the introduction of a new term, and more. A well-styled manuscript will use only logical style names -- never raw italics.

Alas, nearly every authoring system I know goes out of its way to make it easy to italicize text. WYSIWYG systems (at least those on Windows) define ^I to mean "italicize the selection", HTML offers \i, LaTeX has \it, reST offers *text*, etc. It's a similar situation for making things bold or underlined, both of which are also physical styles and should thus be avoided.

From what I can tell, there is no notion of "italicization" in the DocBook or DTBook schemas, and the screen shot for oXygen's XML Author for DocBook shows a toolbar button only to emphasize text, not italicize it. Score one for XML.

I've decided to use FrameMaker, not XML, so I'll be working in WYSIWYGland. I'm still committed to using only logical styles, and although I'm a pretty disciplined guy, I have no illusions that I'm perfect. If I don't find a way to keep myself from doing it, I know I'll occasionally type ^I (or ^B for bold) in FrameMaker without thinking about it. I won't notice my formatting faux pas (note use of italics to indicate a foreign expression, which is quite different from using them to emphasize something or to indicate a book title), because everything in my WYSIWYG world will look right. The situation would likely be the same in an italics-supporting markup-based world such as LaTeX or reST.

Fortunately, FrameMaker's keyboard and menu command set is determined by configuration files, so, with some guidance from the kind folks at the FrameMaker forum, I was able to excise keyboard and toolbar commands for italics, bold, and underlining. Now, should I foolishly try to italicize something via the errant ^I, nothing happens.

Experience Report: My C++ Books on Kindle

I recently announced to my mailing list that, unbeknownst to me (and, as it turns out, my editor), two of my C++ books have been made available on Kindle:
Two of my books -- Effective C++ and Effective STL -- are now available for Amazon's Kindle. I haven't seen these editions myself, so I can't tell you anything about them. Since I don't have a Kindle, this is unlikely to change anytime soon. If you try these editions, please let me know what you think of them. You'll find links to the Kindle editions of these books at http://www.aristeia.com/books.html .
Shortly thereafter, Herb Sutter wrote me as follows:
I downloaded a Kindle sample of one of your books, which included enough to see some source code examples. In general it looks good, except for one thing: Source code is rendered badly. The text is clear, but the two problems are:

a) Line length (though not as bad as narrow magazine columns & what iPhone would be like): Medium-long lines have bad wraps that make the examples a pain to read. But line length is probably going to be an issue for all small-form-factor devices.

b) Proportional font => comments don’t line up. It’s possible to get fixed-width fonts on Kindle but you have to try hard and use the <pre> tag explicitly. For more, see Larry O’Brien’s recent blog note about targeting technical docs to Kindle, including the comment about getting Greek text to work by explicitly using UTF-8.
Herb included a couple of quick-and-dirty screen shots, including this one:

The code examples, which I'd carefully formatted for the font and page width of the printed book, were apparently moved to the Kindle without any reformatting consideration, so the Kindle tossed in new line breaks wherever it felt they were needed. The result, unsurprisingly, is awful. This is consistent with my belief that the chances of writing a book with anything beyond straight prose that looks good on multiple devices is close to zero. The more special formatting that's needed (e.g., for recipes, poetry, code, etc.), the more care an author (and his or her publisher) is likely to need to take.

Bearing in mind that my recent C++ books use two ink colors (black plus a red highlight color for places where I want to focus readers' attention), Herb wrote:
Some text that I think you had as red or some other color looks gray on Kindle. It’s still readable, if a little less distinct. Probably the best that can be done given that Kindle 1 only has a 4-level gray scale. It’d probably look slightly better on a Kindle 2 with a 16-level gray scale, but only slightly since you do have to make it gray enough to be distinct and so I’d imagine you couldn’t just use levels 14 and 16 for example.
I discussed the problem of writing for devices with different capabilities such as color in an earlier post.

Herb concluded:
If you’re serious about thinking of targeting devices like Kindle or iPhone, it’s worth having one.
I replied:
Practically speaking, I'm sure you're right, but this is the kind of thing I'd like my publisher to address for me. I want to focus on content, not device-dependent presentation issues. One of the things my publisher should do is investigate the landscape of output devices, then give me advice on what I should or should not do in my ms to keep it cross-platform-friendly.
I thank Herb for his mini-review of my books on Kindle and for his permission to use his material here.

Monday, March 2, 2009

IO

When I first started thinking about Fastware!, I knew I'd have a chapter on IO, but I was concerned that it would take me a while to have much to say about the topic, and I expected it to be one of the last chapters I attacked. Since then, my view has changed, and in part due to the excitement I got from reading Tom Leighton's article in CACM (and ACM Queue -- sometimes recycling isn't so great, sigh) about improving performance using the Internet, I'm now ready to draft a chapter on IO.

It's an interesting question to try to define IO. Technically, IO is probably anything that communicates with something off-chip, so accessing main memory can be considered IO, although I don't plan to treat it that way. My current plan is to focus on disk and network issues, but even there things are beginning to get fuzzy, because solid state drives use a traditional disk API, but don't have anything akin to rotational latency. At some point I may decide to create a chapter focusing on storage (cache, memory, and "disks," regardless of technology) and another on network IO, but for now, the traditional view that IO largely means disk and network access seems reasonable to me. I welcome comments on the best way to organize these issues.

So what are the topics relevant to the creation of sizzling disk and network IO? Here are the main ones on my list:
  • Prefetching, buffering, and caching (by hardware, drivers, OSes, language runtime systems, and applications)
  • Asynchronous and concurrent reads/writes (including disk striping)
  • Memory-mapping files
  • Avoiding disk fragmentation
  • Network protocol choices (e.g., TCP vs UDP)
  • Doing IO on deltas instead of full data sets
  • Reducing network distances (e.g., via CDNs)
  • Data compression and "bundling" (e.g., CSS sprites)
  • Latency vs. Bandwidth
I welcome suggestions for issues related to the design and implementation of low-latency IO. Because Fastware! is a language-independent book, I expect to make at most passing references to language-specific techniques (e.g., using C++ rdbuf or istreambuf_iterators), but I'm still interested in hearing about them, because it's not uncommon for different languages to have their own approaches to a more general issue, and I want to include discussions of as many general issues as I can.

Saturday, February 21, 2009

FrameMaker. Again.

At the end of the day, if you're going to write a book using some kind of authoring technology, you must decide on the technology you're going to use. I have real-world experience with FrameMaker, LaTeX, and Open Office, and I've looked into or played around with reST, XML with DocBook or DTBook (via Daisy), and Microsoft Word. I've tried to weigh their strengths and weaknesses with respect to issues I've discussed in this blog: conditional content, conditional and personalized formatting, WYSIWYG versus markup, multiple-platform publication, etc. The day has been long, but the time has come to end it, and my conclusion is that none of my choices is particularly attractive. Were I made of the same stuff as Donald Knuth, my solution would be to invent my own authoring technology, but I'm no Knuth, so I'm going to hold my nose and choose the least objectionable of the available options. For me, that's FrameMaker. Again. Sigh.

To a large degree, the decision boiled down to choosing the devil I know over the devils I don't. The more I looked into XML-based approaches, the more I became convinced that I'd have to educate myself in XML, XSLT, and CSS, technologies I don't know. (Not knowing them puts me horribly out of step with the modern world, but I've been busy with other things. If you want to know about the interaction of relaxed hardware memory models on C++ compiler optimizations, how to efficiently make use of the STL, or how to model embedded-system FSAs in C++, I'm your guy. I can also tell you a thing or two about the challenges of multiple-platform authoring.) I could learn those technologies, of course, and it would be interesting and useful for me to do so. But if I honestly want to get Fastware! written, I have to focus on that, and XML, XSLT, and CSS don't fall within that focus. Besides, I suspect (but have not confirmed) that when using an XML schema, I'm constrained to using the style (element?) names defined by the schema, and I'd prefer to use book-specific style names instead of generic names.

Regarding reST and LaTeX, the former I don't know, and the latter I've mostly forgotten. Both would require dealing with learning curves that would ultimately leave me in a place where I still would be unlikely to be able to achieve everything I wanted. Neither is WYSIWYG, for example, and a short time with reST revealed that it's not as expressive as I'd like.

Microsoft Word doesn't really do conditional content (though there is apparently an add-on that makes the feature available), and it has the irritating constraint that certain of its style names can't be deleted, but the real problem is that, based on my limited experience with it, the learning curve is both steep and essentially limitless. (In an earlier post, I likened it to that of C++.) It doesn't help that many authors with Word experience talk about it as if they were the victims of unusually sadistic domestic abuse.

My experience with Open Office led me to conclude that it simply isn't up to authoring the kind of book I want to write.

FrameMaker I already know. It's lacking in the areas of conditional formatting, per-copy personalization, and separation of content from presentation, but I have some ideas for how to deal with these limitations. Furthermore, FM can generate MIF, a textual representation of its documents, and MIF can be transformed into other formats if you throw enough programming effort at it. There's also the FDK (FrameMaker Developer Kit), an API for FM documents, so, again, there's a programmatic hook to do things myself that FrameMaker can't do. I've never tried to transform MIF or use the FDK, so if I have to go that route, I'm back in the land of learning curves, but I think there's a learning curve in front of me no matter what technology I choose. My sense is that, overall, the ratio of time spent writing to time spent fighting with the authoring technology is likely to be best with FrameMaker.

The more things change, the more they stay the same, the saying goes, and certainly that's been the case for me and book-writing. For at least the last 10 years, each time I've finished a book using FM, I've vowed not to use it again. The next time I got ready to start a book, I investigated the alternatives and talked to authors for recommendations. Each time I came away with the impression that there was no choice I was likely to find less unpleasant than Frame. The stuff I'm made of is apparently closer to Britney Spears than Donald Knuth, because oops, I'm going to do it again.

Friday, February 20, 2009

Sources for Fastware!-Related Information

One of the reasons I want to write Fastware! is that I don't know of any existing book that covers the breadth of topics I think are important for developers of speedy systems. To date, most of my blog entries have focused on authoring issues, but most of the time I've devoted to Fastware! since beginning the project nearly two years ago has been spent on collecting and organizing the information that the book will contain.

The information has come from many sources. I began with personal interviews with several groups and individuals working on speed-sensitive systems -- a fascinating experience. That gave me a good base from which to begin my search for further information, information I've found in the form of articles, conference papers, blog entries, podcasts, videos, and more. The most relevant sources of information I've been recording in an Excel spreadsheet, and I've decided to make this spreadsheet available at the (primitive, but I'm working on it) Fastware! web site from time to time. I've just uploaded the initial snapshot, which has 75 entries in it.

It's going to take me a while to write Fastware!, but if you're interested in speed-related information in the meantime, I hope my list of sources will be useful to you.

Monday, February 16, 2009

Post-TOC Thoughts

Last week I attended the Tools of Change for Publishing conference (TOC), during which I gave a talk discussing some of the authoring challenges I've written about in this blog. I came away more convinced than ever that writing primarily for electronic publication (while keeping print publication in mind) is best for everyone: authors, publishers, and readers. I also came away further convinced that authors should think about audio distribution as they write, a conclusion reinforced by Amazon's inclusion of automatic text-to-speech (TTS) capability in the Kindle 2. (This feature has caused a bit of a rights-related stir among some in the publishing industry, but I believe it's in everybody's interest to work this issue out, so I'm confident that it won't take long for most content to be available in this form. Contracts for future books will address this issue directly, and in a couple of years, no one will think twice about this.)

One of the things I got from the conference is that authors would be well-advised to assume that as time goes on, more and more people will consume content on small devices. Currently we call such devices "mobile phones," but in reality, they're truly personal, truly portable computers -- the general-purpose devices people will have with them almost all the time. As such, electronic books in whatever format will be increasingly viewed on small screens.

Think of what that means for content that normally features complex diagrams, large tables, etc. In the past, I've written my books knowing that the content would fit on physical pages of about 9 x 7 inches. After taking margins into account, I had pages of about 7.25 x 4.75 inches to tell my story, and I designed things to fit within those constraints. An iPhone screen is about 3 x 2 inches, meaning that if I want my content to work well on that device, I have to make sure it will (1) fit and (2) be comprehensible when it's displayed there. Prose is not a problem, but other things could be: source code listings, diagrams, tables, graphs, etc.

It seems unlikely that people will want to read lengthy ("long-form") content on a tiny screen, and usage data suggest that people generally read using such devices during "between times," i.e., while commuting, while waiting for a meeting to start, etc. -- generally no more than about 20 minutes a pop. The best content for such reading is "short form," and this suggests that authors who want to make their content small-screen-friendly need to find a way to break their presentations into comparatively small chunks that are naturally consumable more or less independently.

Interestingly, this is one of the characteristics that my "Effective" books tend to have, because the material is generally broken down into Items of 4-6 pages that pretty much stand on their own. This is a popular feature of these books, something I didn't realize until I wrote some much longer Items in my second book (More Effective C++), and people complained about it. This suggests that a naturally chunkable presentation not only makes things more small-screen-friendly, it can also be a benefit in print form.

Fastware!
won't be broken down into Items, because I don't think that's the best way to cover the material I want to discuss, but I'll definitely be thinking of ways to structure the material so that it can be easily consumed in relatively small, self-contained chunks.

On small screens.

Or as audio.

In addition to ink-on-paper form :-)

Saturday, December 27, 2008

Notes from Effective C++ CD

During 1998, much of my time was devoted to designing and helping supervise the implementation of an electronic version of two of my books. The result was Effective C++ CD, an HTML implementation of the books and some associated magazine articles, where links connected everything on the CD. In our work, we addressed some content issues and many presentation issues, and we produced enough innovations to merit an article for practitioners in Microsoft Internet Developer and a paper for academics in Proceedings of the 5th Conference on Human Factors & the Web.

I recently found myself in the directory with the CD's files, and I took the opportunity to review the notes I'd made regarding how we could improve the CD were we to undertake the project again (e.g., as a second edition). Most of the comments were specific either to the content of the CD or to the web-browser-as-book-viewer decisions we made (hence not germane to my work on Fastware!), but some remain relevant today. Here they are (in no particular order):
  • Text in graphics should be visible to search engines. (This generalizes: text in figures, tables, diagrams, animations, etc., should be visible to search engines.)

  • Electronic versions of books are essentially software and, like contemporary software, they should be updatable via the net. When bugs are fixed in an electronic version of a book, owners of that book have a right to expect to be able to incorporate those bug fixes into the book they've purchased.

  • The books on my CD are organized into either 35 or 50 "Items," which are essentially technical essays. Each book has an extensive set of cross references among its Items, so Item 22 might refer to Item 21 and Item 8. Within a book, this was fine, but when I added links between the books' Items (a feature exclusive to the CD), I had to make clear which book had the Item I was referring to. That is "Item 22" was unambiguous within a print book, but it wasn't unambiguous on a CD with two books, each of which had such an Item. I addressed this by prepending a book-specific letter to the Item number for Items outside the current book, e.g., "Item 22" is in the current book, but "Item M22" or "Item E22" is in the other book.

    A fair number of people found this confusing. One way to address this problem would have been to always use the E or M prefix on all Item cross references, but this would have introduced syntactic noise and led to the electronic books not looking the same as their print versions.

    The real problem, I think, is trying to figure out how to write for something that might stand alone (as, e.g., a print book) but that might also be part of a collection of interlinked documents. Readers of the standalone version shouldn't be bothered with cross reference disambiguation overhead that's needed only in non-standalone environments, but books they know from their standalone versions should look essentially the same as the versions they encounter in linked environments. (For a related discussion, consult my vision for electronic books.)

  • Because link text generally looks different from non-link text, it calls attention to itself. That's the point of it looking different: to communicate, "Hey, I'm clickable!" Unfortunately, when too much text tries to get your attention at the same time, it intrudes on the reading experience. For example, contrast this, where every reference to Amazon is linked,
    You can buy lots of stuff at Amazon. That's because Amazon sells lots of stuff. Amazon customers expect lots of stuff at Amazon, because that's what Amazon is know for. Yay, Amazon!
    with this, where only the first reference is linked:
    You can buy lots of stuff at Amazon. That's because Amazon sells lots of stuff. Amazon customers expect lots of stuff at Amazon, because that's what Amazon is know for. Yay, Amazon!
    Failing to make links out of text that's already been made a link recently is, to me, akin to using pronouns to refer to nouns that have been recently introduced. Pronouns make text more interesting and less repetitive, but harder to understand out of context. Non-link text is similar: it avoids visual repetition, but it makes the text harder to understand out of context.

    There are two additional issues relating to whether repeated text should be made active at each point of repetition. The first has to do with consistency. More than one reader of my CD complained that I was inconsistent about what text was linked and what was not. These readers seemed to expect all naturally linkable text to be linked, no matter how many times that text occurred, even within a short space.

    The other additional issue concerns search engines, which can plop you down in an arbitrary location in an arbitrary document. If you start reading and you encounter a pronoun, you naturally scan backwards looking for the antecedent. But if you encounter text that seems like it should be a link, my guess is that you don't scan backwards looking for the same text in link form. Rather, you get annoyed at the author for failing to make the text you're looking at a link. That leads to the challenge: how do you avoid the visual clutter that accompanies making every occurrence of naturally linkable text into a link while also meeting the link-related expectations of readers who use search engines to take them to the point in a document where they start reading?

  • Clicking on naturally linkable text like "Item 5" or "Section 3.5.1" or "Chapter 4," where the target of the link generally has a title, leads to some readers wanting to see the title without having to traverse the link. Instead of
    As you'll see in Chapter 4, ...
    they'd prefer to see:
    As you'll see in Chapter 4 ("Giant Anteaters"), ...
    Some authors do this in print, but I find it distracting as a reader. As the author of an electronic book, I can offer the title as an option by, e.g., displaying it when the mouse hovers over the link text. But that means I have to make sure that capability is provided when my book is prepared for electronic publication. (A similar capability can be used to avoid making readers turn to a glossary to see a term's definition.)

  • If I'm looking at an nth-level index entry, it would be nice to have an easy way to get to the n-1st level, i.e., essentially a way to move to the parent entry for a child index entry.

  • One reader wrote:
    It would be great if you can have a table of content showing in the navigation area, and there is a toc synchronization function (much like Microsoft Workshops), so that the readers will have a better idea of where they are in the book.
    This is one way to address the "where am I?" problem that can arise as the result of a search or when following a link from one part of a document to another (or from one document to another).

Monday, December 15, 2008

An Introduction to Fastware!

All my blog entries to date have been about issues related to authoring: things that affect my choices among writing tools and my strategies for effectively conveying the information I want to get across to my readers. (As I've noted before, the term "reader" is misleading, because one of the forms in which I'd like Fastware! to be usable is audible. The proper term is probably "content consumer," but I'll stick with "reader," in part because it's a lot less ugly, in part because it better reminds me that I'm primarily writing for humans, not machines.) This blog entry is a bridge between authoring concerns and content issues, because it touches both.

Experienced authors and publishers will tell you that you usually write a book's introduction last, because you can't really know what needs to be introduced until you've written it. When working on my past books, I used the placeholder introductory chapter as a dumping ground for terms that needed to be defined, assumptions that needed to be explained, conventions that needed to be described, etc. When the book proper was done, I'd go back and sift through the debris that had made its way to what was to become the Introduction, take a deep breath, and do my best to make a coherent narrative out of the odds and ends I found there.

For Fastware!, I chose a different approach. My experience has been that the need for a book on how to write software that runs quickly is not self-evident to many people. That bothered me. I view the case as overwhelmingly strong, and I felt compelled to make that case right away. As a result, I wrote Fastware!'s Introduction first, and I've now made a draft available at the book's web site. In its current form, the chapter is more manifesto than Introduction, and I know I'll have to add more material once I've written the rest of the book, but it should give you a good idea of what I envision the book to ultimately be.

There are two parts to that vision, content and presentation, and the Introduction should give you a glimpse of both. (If you've been following this blog, you know that I believe that content and presentation are not really separable. If you haven't been following the blog and are interested in this view, check out this and this.) The content should be self-explanatory. If it's not, I've botched my job, and please let me know about it, either as comments on this blog or as email to smeyers@aristeia.com.

Regarding the draft presentation, here are a few things I think worth pointing out:
  • The book is about speed, and visually, it should come across that way. One way I've tried to convey this is the use of italics in the chapter title, section and sidebar heads, and the footer. Like runners striving to move faster, italic letters lean forward. Another way is the fireball behind the chapter numbers. This is cheesy in its current form, but I have no illusions that I'm an artist; the fireball is a placeholder. My original idea was to have flames shooting out the back of the page numbers, and I'd ultimately like to do something more like that. Another problem with the fireball is that it's too prominent, but that can be toned down in various ways (e.g., increase the transparency of the image). The main thing is to come up with subtle ways to suggest movement -- fast movement -- through the book's layout and formatting.
  • "Voice of Experience" sidebars reinforce material in the chapter they accompany. This is one of the ideas for Fastware! I'm particularly enthusiastic about, and it straddles the line between content and presentation. The book will contain lots of suggestions about how to write fast software, and after a while, I expect readers to roll their eyes and mutter, "Yeah, yeah, yeah..." Some of the suggestions may strike some readers as less important than I know them to be, and I worry that such readers will skip the muttering and simply roll their eyes.

    Some authors, to reinforce the points they make, offer fictional examples demonstrating how things could play out in practice. Other authors give real examples from their own experience. Few authors have the background to personally vouch for the full range of topics I'll cover in Fastware!, and, alas, I'm not one of them. The "Voice of Experience" sidebars are my way of bringing in guest speakers who, in their own words, can back up what Fastware! tells them. My plan is to have two sidebars per chapter, although I currently have only one in the draft Introduction.

    The sidebars are designed to have a different look to them, and not just to make it clear that they are sidebars. For readers reading straight through, I want them to pop up from time to time as visual and semantic treats. For readers flipping through the book, I want them to stand out as easy-to-find nuggets that stand on their own and provide useful "from the factory floor" information.
  • Color output is the default. Especially as time goes on, I expect more and more readers to experience Fastware! on a color-capable device, so the primary presentation format should take advantage of color. In the draft Introduction, I sometimes use color to bring out semantics (e.g., for clickable URLs and email addresses, although they are not active in the PDF I posted, sorry). In other cases, I use color simply to make the work more visually engaging. Some will pooh-pooh this use of color, but it has as great an impact on a prospective reader's evaluation of a book as do things like font choices, interline leading, footnotes vs. endnotes, etc. Black electronic text on a white electronic page looks as anachronistic to contemporary readers as black and white TV shows do to contemporary TV viewers. In my draft introduction, I use color in a number of ways to enliven the visual effect: for section headers, for bold-faced text, for sidebar backgrounds, in the line above the footer, in the page number fireball, in "The Voice of Experience" photographs. My goal is to produce a book that looks somewhat less like a traditional book and somewhat more like magazines and web pages.
If you have any comments on my vision for Fastware! (content or presentation) or about the draft Introduction, please let me know, either as comments on this blog or via email to smeyers@aristeia.com.