Recent Changes · Search:

Support the Project

Wikipublisher

PmWiki

edit SideBar

 

Web and print exist as two solitudes: printed web pages often disappoint and converting print documents into good web pages is hard. A wiki makes it easy for authors to create rich web content, but is little help if readers wish to print the results. Wikipublisher lets readers turn wiki pages or page collections into print, with a quality better than most word processing documents. This lowers the time and cost of creating online and print versions of the same content, with no loss of quality in either medium.

  PDF    

PDF settings (show)

Introduction

Using a wiki makes it easy for authors to work together wherever they may be, as long as they have access to a web browser. This is fine if the result you want is web pages, but what if readers wish to print these? Most people reading more than a page of text will print it; a study of scholarly reading behaviour reports that 80% of researchers read scholarly articles on paper and only 20% read them online [4].

Reading a printed web page is usually a disappointing experience — we have been conditioned to have low expectations of printing from the Web. Even if the printed result is “good enough” we can still only print one web page at a time, unless the site has deliberately created multi-page articles with a combined “printable view” of the content.

Wikipublisher changes this, by giving wiki administrators a way to turn individual pages or page collections into a document suitable for printing. Wikipublisher passes transformed wiki content to the LATEX document preparation system [2], to produce printed output of the highest quality — superior to anything that can be achieved using CSS or with most word processors.

The Wikipublisher Project

The project’s home page is http://www.wikipublisher.org/. The project was conceived in 2004, we released the first beta version of the Wikipublisher system in late 2005, and have released updates every 2–3 months since. All the software is free and open source.

Goals

We adopted a number of design principles for the project [5]:

Online First.

The Web enhances our ability to communicate, so most of our work ought to appear online first. Creating content online first makes it instantly and widely accessible, and encourages linking to other online resources. Yet most of our authoring tools are “print first” and turning print documents into HTML for the Web is hard to do well.

Print Still Matters.

A web page worth reading is worth printing. The longer and richer the content, the more likely the reader is to print it. We may skim read a 50 page report online, but to study it, we print it. Few web site designs appear to care what the printed form of the site looks like. Experience has taught us to have low expectations of printed web pages.

One Authoritative Source.

The most up-to-date content version appears on a web page; the typeset PDF is a point in time snapshot. In contrast, most publishing systems require three or more versions: word processing or other source; a PDF snapshot of the word processing source, and a collection of static web pages generated from the source.

Architecture

Architecture shows the structure of Wikipublisher. The core architectural decision was to treat generating web pages and generating print pages as separate services. This means one print server can potentially support many web page servers — printing is in most cases a low volume activity compared to browsing, so it is inappropriate to burden the web page server with print duties. We define a print API that lets a web server expose its content in a way that the print server can process. As a result, the print server can work with any web content management system able to support the print API. This design also promotes a more rigorous separation of the underlying content from its presentation in different media, making a wiki an ideal lightweight content server.

Architecture
Wikipublisher Architecture

Authors interact with the wiki server with a web browser (1 and 2). To create a print document, a reader submits a form (3) to the print server which says, “If you issue this http request (4), you will receive a stream of Wikibook XML (5); convert it to LATEX and PDF, then give me back the result (8).” The wiki administrator has configured the wiki server so that (4 and 5), “If you receive an http request (4) in this format, convert wiki to XML instead (5) instead of HTML.” The wiki server thus needs to give the reader a form (2) in order to (3), “Tell the print server (at this address) to issue this http request.” Finally, the print server needs to retrieve supplementary materials, such as image files, referenced in the XML (6 and 7), and return a print document (8).

Implementation

We chose LATEX as the typesetting system after eliminating the other candidate, XSL-FO. Formatting Objects is a markup language for XML document formatting which is most often used to generate PDFs. The FO vocabulary is part of XSL — a set of W3C technologies designed for the transformation and formatting of XML data. At the time we looked (December 2004), we could not find any books on FO that had been published using FO. On the other hand, all the books on LATEX were published using LATEX. It seemed to us that choosing FO would introduce unnecessary risk and that the risk would make its presence felt as unexpected costs or worse, insurmountable implementation problems. On the other hand, LATEX appeared to do everything we could think of.

Implementation shows the pipeline tool suite approach adopted for Wikipublisher. Wiki markup is translated into an intermediate print-oriented XML form, and then transformed into LATEX. The reasons were largely pragmatic — we built on top of things that already worked. The tbook system [1] is a free software project for converting XML documents into LATEX using XSLT, so if we could convert wiki markup into XML, we could use tbook to typeset it. The PmWiki project [3] is a markup agnostic wiki engine (almost), which lets a site administrator redefine or augment the markup translation rules.

Implementation
Wikipublisher Implementation Pipeline Tool Suite

We wrote a plug-in for PmWiki [6] (written in PHP) that replaces all the wiki to HTML translation rules with wiki to XML rules. We found that the wiki markup had rules for which there were no equivalents in the tbook DTD and hence no XML to LATEX translations. We therefore added a range of extensions to the tbook DTD, style files and XSLT, and called the resulting XML to LATEX conversion service Wikibook and Wikibook DTD. The plug-in also provides a “print metadata manager” which lets authors and readers customise the way the print output is presented, by passing configuration parameters to the Wikibook PDF server.

Transformation

We made XML generation and Wikibook transformation as robust as possible. Consistent presentation of printed outputs is completely automatic — not just within a document type (all reports have the same look), but different document types are all recognisably part of the same family. Businesses which typically produce a large number of documents of a small number of document types can get a consistent look (a house style) at minimal cost and in particular with less quality control effort. There is a huge quality advantage when we shift typesetting from the desk-top to the server, because we eliminate local stylistic variations. Of course, limiting local customisation can also be a disadvantage in many situations.

Bring Print Culture to Web Content

When converting web pages to print, people expect the typesetting engine to apply standard layout conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. Authors focus on content, rather than presentation.This means authors do not need to be typesetting experts to produce professional-looking print documents from their web page collections.

Examples of print layout conventions include the treatment of quote marks, hyphens and separators:

  • smart quotes — treat straight quotes as markup and “smarten” them into the equivalent HTML entities, including exceptions such as ’phone and prime marks such as 6′ 30″
  • smart hyphens — recognise em dash and en dash markup — such as 1–10 and Wellington–Picton — and translate these into the equivalent entities
  • smart separator — a horizontal rule offers an opportunity for a small typographic flourish, as the (configurable) separator below shows

Links to external Web pages often have long URLs, typically resulting in spurious white space in justified text. Wikipublisher automatically makes all punctuation characters discretionary (i.e., we treat punctuation in URLs as wiki markup), thereby allowing a long URL to split over 2 or more lines if necessary. To avoid possible ambiguity, we put the punctuation character after the line break. We print the link text in the body and the URL in a footnote or citation.

Writers also no longer need to know that the convention for use of footnotes is that the footnote reference character appears after a punctuation mark such as a comma or a full stop. Wikipublisher just makes sure it happens. If a reader chooses to publish a page using vertical spaces between paragraphs (rather than indents), Wikipublisher will automatically change the formatting of footnote text to hang the number into the left margin and suppress the first line indent.

Style information is only partially supported. There are major differences between the CSS-based model used in HTML and the structure-based approach used in LATEX. The approach taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as put a red dotted border around this paragraph with 5px padding).

Public Wikibook PDF Service

We run a free public Wikibook PDF server for those wishing to try out the software. In the past 5 years we have had 340 wiki sites registered to use the Wikipublisher system via the public server. This has been a fruitful source of feedback for the system’s evolution, in response to others’ experiences.

The web site has an issues register for people to log bugs or change requests, a tip of the week where we publish short “how to” stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) to extend Wikipublisher’s capabilities. There is also a discussion group, which is often the first point of contact for people encountering problems or seeking new features.

Future Work

The Wikipublisher system is functionally complete for our own purposes, but offers several avenues for further development. The project sees four areas where future work is desirable, if there is sufficient user demand.

Wikibook on Microsoft Windows

To date, people are running the Wikibook PDF server on various flavours of GNU/Linux plus Mac OS X. The project regularly receives e-mails from people who want to know whether it will run on Windows. We reply that as far as we know, all the software they will need is available for Windows, but we do not know of any Windows installations. We offer to help them work through any issues they may encounter and to document the Windows installation process. We are yet to hear back from one of these emails.

Extension to other content engines

There are limitations to the adoption of Wikipublisher. Since it was possible to write a plug-in for PmWiki to output Wikibook XML, it should also be feasible to add the same API capability to other wikis, blogging engines and web content management systems which support third-party plug-ins. Again, the project has received several enquiries, in particular about support for MediaWiki, but to date none has turned into a real project. Given the scale of the undertaking and in the absence of a customer willing to provide time or money, we have been reluctant to embark on this.

User-specified LATEX classes

In an ideal world, an author could instruct the Wikibook PDF server to typeset their content using any valid LATEX class file (as long as it is reachable with an http request). The current Wikibook DTD defines four distinct document types: letter, article, report and book. The wiki plug-in makes sure the wiki produces Wikibook XML that complies with the requested DTD. To support user-defined classes, Wikipublisher would have to make sure that the document type used is compatible with the specified class.

It would have been really useful to load the correct ACM template for this paper! As it was, the authors exported the raw LATEX as an article and manually converted this to use a different class.

Use of Wikipublisher

To inform further development of the system, we would like to conduct an empirical study of how people are using Wikipublisher. We would like to explore the following research question with the current user base: “What has been your experience using Wikipublisher?” We envisage setting up an online survey form (on Wikipublisher) and gathering qualitative data from a self-selecting sample of users. The survey would explore the kind of content, motivations for adopting Wikipublisher, benefits they have gained, issues they have encountered, and their plans for the future.

Conclusions

The better Wikipublisher does its job, the less people notice it; good typography is invisible, letting the reader focus on reading. In producing print documents, most people are accustomed to making a trade-off between the convenience of a word processor and the quality of a desk-top publishing system. Most choose convenience, with the unfortunate result that typographic mediocrity has become entrenched in our culture. A big reason for the popularity of wikis is their convenience. Wikipublisher lets us combine the convenience of a wiki with the typesetting quality of the finest desk-top publishing software. Because the system embeds good typesetting practices in the software, the quality comes free.

References

[1]   T. Bronger, 2003. The tbook system for XML Authoring, tbookdtd.sourceforge.net, accessed on 27 March 2009.

[2]   L. Lamport, 1994. LATEX: A Document Preparation System (Second Edition). Boston: Addison-Wesley.

[3]   P. Michaud, 2002. PmWiki: a wiki-based system for collaborative creation and maintenance of websites, www.pmwiki.org, accessed on 27 March 2009.

[4]   D. Nicholas, P. Huntington, H.R. Jamali, I. Rowlands, T. Dobrowolski, and C. Tenopir. Viewing and reading behaviour in a vitual environment: The full-text download and what can be read into it. Aslib Proc. 60(3):185–198, 2008.

[5]   J. Rankin. Wikipublisher: A Web-based System to Make Online and Print Versions of the Same Content, TUGboat 29(2): 264–269, 2008.

[6]   J. Rankin, 2009. PmWiki Plug-in: PublishPDF, www.pmwiki.org/wiki/Cookbook/PublishPDF, accessed on 27 March 2009.

Creative Commons License
Edit · History · Print · Recent Changes · Search · Links
Page last modified on 14 June 2009 at 08:43 PM