Recent Changes · Search:

Support the Project



edit SideBar


Wikibook XML: How to produce Wikibook XML

How does Wikipublisher teach PmWiki to generate output in Wikibook XML instead of HTML?

A printer’s ornament in the form of a lyre and vine

In Tip 00035, we saw how Wikipublisher turns Web pages into PDF suitable for printing. This tip describes the process of generating Wikibook XML from a Web site, so that a Wikibook server has something it can compose.

One of the reasons Wikipublisher uses PmWiki is that it is markup agnostic, meaning an administrator can instruct the wiki engine how to interpret any particular markup sequence. In the normal course of events, the wiki converts input markup into HTML and embeds this in the specified site skin. The Cookbook:PublishPDF plug-in teaches PmWiki to:

  1. use a skin appropriate to a particular print document type
  2. convert wiki markup into Wikibook XML — a print-oriented DTD
  3. detect and remove any residual HTML tags in the output

When the reader selects PDF output, Wikipublisher completely replaces the markup rule set. In practice, PmWiki is nearly, but not quite, markup agnostic. Experience has shown that it is a little bit pregnant with HTML DNA:

  • in a few places HTML tags are hard-wired into the source code; these are trimmed from the Wikibook output
  • sometimes HTML is embedded in the input markup, such as attribute names and values; these cases are generally ignored
  • PmWiki’s markup processing occasionally conflicts with Wikipublisher; specifically, Keep(MarkupToHTML( … )) results in text bereft of markup

The last item is especially painful, since it means page lists do not work correctly unless an administrator patches the PmWiki source.

Third party recipes have to be tested on a case-by-case basis. Many recipe writers assume HTML output and hard-wire HTML tags in their code. These only work correctly with Wikipublisher if the recipe author or site administrator provides a Wikibook XML version of the markup rule. The Cookbook includes several examples.

Why use a different output markup? Why not transform HTML into LATEX? It can be done — the first version of Wikipublisher did exactly that. Using Wikibook XML gives much better results. This because HTML’s purpose is to represent Web pages, not print pages, whereas Wikibook XML is designed from the ground up to represent print. Form ever follows function.1

Wikipublisher includes an option to generate Wikibook XML output only, via the View XML button. This can be saved and opened in any XML-aware editor. For example, someone requiring a unique print layout can use a tool such as Adobe InDesign to read the XML and apply tag-specific presentation styles.


1 (↑)

Category: system

« 00035 · Edit Form · 00037 »

Creative Commons License
Edit · History · Print · Recent Changes · Search · Links
Page last modified on 12 June 2008 at 09:43 AM