Tuesday, December 2, 2008

A Look at reStructuredText

Several comments on this blog referred me to reStructuredText (reST) as a markup alternative to LaTeX and DocBook. reST is part of the Docutils project, which, probably because I am not a Python programmer, I had not heard of.

I've now spent a little time reading about reST, examing some of its output, looking at some document source files, writing and processing some trivial examples, and posting questions to the Docutils mailing list. On a scale of knowledgability about reST that runs from 1 to 100, that puts me at about 1.1. Still, I've come to a few conclusions:
  • For common things like numbered or bulleted lists, reST markup is less verbose (hence less intrusive) than LaTeX's.
  • For inline style-based markup, they seem to be about the same. reST's :foo:`Text` is LaTeX's {\foo Text}.
  • The LaTeX user community seems to be larger than reST's, and while there are several books on LaTeX, there don't seem to be any dedicated to reST or Docutils.
  • reST documents are typically viewed as HTML, LaTeX documents as PDF. This is noteworthy, because I currently expect PDF generation to be more important for Fastware! than HTML generation.
  • reST more rigorously separates content from presentation.
This last point may be the most interesting. As part of my kicking of reST's tires, I looked at some documents I'd written using LaTeX, trying to figure out how I'd be able to achieve the same effect in reST. I generally had little trouble, but then I noticed a table I'd included in an IEEE Software paper from long ago:

Each entry here is centered both horizontally and vertically, and it occurred to me that I'd never noticed such layout in tables generated from reST. I started googling around for centering support in reST, and, thanks to help from the Docutils mailing list, I eventually came to understand that there is no notion of "centering" in reST. Whether something is centered isn't content, it's presentation, and presentation decisions are made downstream from reST, e.g., by CSS style sheets for presentation via HTML.

Something else that became apparent is that while the above table could be produced from reST by slapping an appropriate "centering" attribute on the entire table, reST doesn't really have a way to express metadata (e.g., presentation information) on a cell-by-cell basis. So if I wanted some cells' content to be centered and others' to be, say, left-justified, reST isn't up to that. I can't think of a case where I'd want to do that, but I know of lots of cases where I create spreadsheets where some rows or columns use different justification settings. Here's part of the spreadsheet I've been using to compare link-related features of various electronic books:
Note that some columns are horizontally left-justified, while others are horizontally center-justified.

This takes us back to a topic I covered in one of my earliest posts: the lack of strict separation of content and presentation. The way information is placed in a table can help comprehension or it can hurt it, and as an author, I want to make sure that the presentation helps. Certainly the proper presentation of table information can vary from row to row and column to column within a table or between tables, and the fact that both Excel and LaTeX also offer per-cell formatting support strongly suggests that there are situations where content creators feel that such control is useful.

In response to one of my requests for information on the Docutils mailing list, David Goodger commented:
In terms of expressive power, LaTex > reST. In terms of readability and convenience, reST > LaTeX. Take your pick. If you're picky about the formatting details, reST may not be for you.
Alas, I am picky about formatting details, in part because I'm a control freak, but in part because I believe that formatting is related to comprehensibility, and, fundamentally, comprehensibility is pretty much the only thing that matters. (Okay, accuracy is kind of important, too.) An author's job is to convey his or her message as effectively as possible. That requires an expressive medium in which to represent that message. My concern is not so much that reST is less expressive than LaTeX, it's that it's less expressive than what I think I might reasonably want. I don't need the most expressive book-writing system available, I just need one that's adequately expressive. If reST doesn't offer a way for me to produce tables in a form I already know I employ, that's a problem.

reST looks to be a nice markup language, easy to learn and use for many purposes, especially the production of web pages. I was impressed with the low barrier to entry: I downloaded and installed Python and docutils and was producing HTML from reST in under an hour. It's not hard to find impressive-looking decidedly nontrivial web pages generated from reST (e..g, the Python multiprocessing documentation pointed out by David Niergarth or the pages at Saifoo). Still, I can't shake the doubt that if I go with reST, I'll eventually bump into something I want to be able to express, but can't. I'm therefore still leaning towards LaTeX.

Besides, I still have that friend who's offered to be my personal LaTeX consultant :-)

6 comments:

Krishna said...

Hi,

LaTeX and ConTeXt are probably the best for PDF generation but there are some interesting alternatives.

* reST can be converted to LaTeX using Sphinx.

* generate HTML from reST and convert HTML to PDF using Prince. This article talks about this and even provides a prefab stylesheet to aid in the conversion: http://www.alistapart.com/articles/boom


Cheers,
-Krishna

Anonymous said...

There is a 'writer' included with docutils (rst2latex.py) that will generate LaTeX from reST. So you might be able to get the best of both worlds by writing in simple reST and then tweaking the generated LaTeX.

It is also trivial to create reST directives that output specialized html or LaTeX. We use quite a bit of them on Siafoo (not SaiFoo!) for doing LaTeX math, code, graphs, etc (http://www.siafoo.net/tools/reST)

Scott Meyers said...

Regarding Stou's comments, the problem with tweaking the generated LaTeX (or the generated anything) is that I want to make all my edits in the master source so that I can generate all downstream formats automatically. If Fastware! is successful, it will go through many printings, and I will want to be able to tweak the book content for new printings without having to edit the resulting generated code.

I've fixed the Saifoo misspelling -- sorry!

Anonymous said...

Based on your latest post, it sounds like you've already made up your mind. I'd just like to add more to the idea of using reST as the primary source, HTML as an intermediary and using an HTML-to-PDF tool. Prince is a commercial option, and produces beautiful output. There is another tool, CSStoXSLFO, that provides similar features. Again, since you don't have as much experience with CSS and XSL, this probably still woudn't be the right toolchain for you, but may be worth exploring for others.

Anonymous said...

You can use the `raw` directive to include LaTeX source into the LaTeX writer. Something like this

.. raw:: latex

\setlength{\parindent}{0pt}

Chris said...

Sorry for the double post... I used the OpenID login which seems broken, the above comment was made by me.