Recent Changes · Search:

Support the Project

Wikipublisher

PmWiki

edit SideBar

 

Web and print exist as two solitudes: printed web pages often disappoint, while converting print documents into good web pages is hard work. A wiki makes it easy for authors to create rich web content, but is little help if readers wish to print the results. Wikipublisher lets readers turn wiki pages or page collections into print with a couple of clicks, with a quality better than most word processing documents. This dramatically lowers the time and cost of creating online and print versions of the same content, with no loss of quality in either medium. Readers can set their own printing preferences and turn wiki content into letters, articles, reports, or books.

  PDF    

PDF settings (show)

Introduction

Fast, cheap, or good. Pick any two. — Popular saying [17, p119]

Using a wiki is a great convenience for collaborative authoring, making it easy for authors to work together wherever they may be, as long as they have access to a web browser. This is fine if the result you want is web pages, but what if readers wish to print these? Most people reading more than a page of text will print it; a study of scholarly reading behaviour reports that 80% of researchers read scholarly articles on paper and only 20% read them online [11]. The number of sites describing techniques for improving the printability of web pages is evidence of the demand for this capability [18].

Reading a printed web page is usually a disappointing experience — we have been conditioned to have low expectations of printing from the Web. Even if the printed result is “good enough” we can still only print one web page at a time, unless the site has deliberately created multi-page articles with a combined “printable view” of the content. Currently, experience is fast and cheap, but poor quality.

Wikipublisher changes this, by giving wiki administrators a way to turn individual pages or page collections into a document suitable for printing. It is designed for people who write long, complex, richly linked documents, who wish to publish these in an accessible form on the Web and to be able to print them on demand. The reader presses a “typeset” button on the wiki page and the system returns a PDF document. Wikipublisher passes transformed wiki content to the LATEX document preparation system [8], to produce printed output of the highest quality — superior to anything that can be achieved using CSS or indeed with most word processors. Since the wiki source file is used for both Web and print, teams maintain one authoritative version of the source. This raises the question of whether a word processor is still necessary if you have a wiki.

Wikipublisher as wiki shows a screenshot of this paper as a wiki article saved in the PmWiki file format on the Wikipublisher project web site. This figure shows the result of generating a PDF document from the wiki page in Wikipublisher as wiki. A user clicks the PDF icon or the typeset button and then the next wiki page displays options on how to get the document, either a download or by email. There are also options to customize the PDF, such as change the document (e.g. title, page size), text (e.g. table of contents, fonts, text size), and float (e.g. positioning, size) settings.

Wikipublisher as wiki
A screenshot of this paper saved in the Wikipublisher project wiki
Wikipublisher as pdf
PDF output produced from typesetting the wikipage in Wikipublisher as wiki

This paper describes experiences creating, using and supporting a system for transforming wiki pages into print. The authors collaborating to write this paper used Wikipublisher to do so. Project Goals describes the project goals. The Wikipublisher Project describes the Wikipublisher system including the architecture and implementation. Experience outlines our experience of developing and using Wikipublisher. Related Work presents related work and Future Work identifies possible future work. Finally, Conclusions draws some conclusions.

Project Goals

Affinity Limited is an information management consultancy based in Wellington, New Zealand. Historically, Affinity used Adobe FrameMaker, originally a multi-platform XML authoring and print publishing tool. However, under Adobe’s stewardship, its platform support has reduced to Microsoft Windows only. Adobe’s decision not to produce a version of FrameMaker for Mac OS X led Affinity to explore alternatives and gave impetus to Wikipublisher’s development. Affinity had started using wiki software to improve communication and collaboration with its (mainly government) clients in 2002. The most common feedback was, “This is great, but I need to print the page and give it to my manager, and it looks terrible. Can you convert it to Microsoft Word, please?” Automated conversion to a printable form seemed like a better idea and Wikipublisher is the result. Another contributing factor behind the project was that about the same time the New Zealand Government set a policy requiring all material on departmental web sites (including long doc and PDF files) be accessible in HTML format by the end of 2006.

We adopted a number of design principles for the project [14]:

Online First.

The Web enhances our ability to communicate, so most of our work ought to appear online first. Creating content online first makes it instantly and widely accessible, and encourages linking to other online resources. Yet most of our authoring tools are “print first” and turning print documents into HTML for publishing on the Web is hard to do well. Too many long documents are simply posted to the Web as PDF files. The links in an online document make it easy to navigate, yet print first authoring tools do little to encourage rich inter-document linking. So the goal is for a system with support for direct creation and editing of web pages.

Print Still Matters.

If a web page is worth reading, it is worth printing. The longer and richer the content, the more likely the reader is to print it. We may skim read a 50 page report online, but if we want to study it, we print it. If we want to deliver a printed and bound version, it needs to look good and be laid out for optimal readability. Yet few web site designs appear to care what the printed form of the site looks like. Most appear to make the assumption that all the information on the site can be chunked into short, easily digested pieces. As readers, experience has taught us to have low expectations of printed web pages. So the goal is a system that produces high quality print documents from web pages.

One Authoritative Source.

The most up-to-date content version is what appears on a web page; the typeset PDF is a snapshot taken at a point in time. This means the printed page can never be newer than the content on a web page. This is in direct contrast to most publishing systems, where there are often three (and sometimes more) versions: word processing or other source; a PDF snap shot of the word processing source, and a collection of web pages generated (perhaps with edits) from the source. The more frequently the content changes and the more authors involved in creating it, the more important it is to have a goal of one source.

The Wikipublisher Project

The project’s home page is http://www.wikipublisher.org/. The project was conceived in 2004 and we released the first version of the Wikipublisher system in 2005. All the software is free and open source. We now describe the architecture, implementation, and transformation process of the Wikipublisher system.

Architecture

Architecture shows the architecture of Wikipublisher. The core architectural decision was to treat generating web pages and generating print pages as separate services. This means one print server can potentially support many web page servers — printing is in most cases a low volume activity compared to browsing, so it is inappropriate to burden the web page server with print duties. We define a print API that lets a web server expose its content in a way that the print server can process. As a result, the print server can work with any web content management system able to support the print API. This design also promotes a more rigorous separation of the underlying content from its presentation in different media, making a wiki an ideal lightweight content server.

Architecture
Wikipublisher Architecture

Authors interact with the wiki server as usual with a web browser (1 and 2). To create a print document, a reader submits a form (3) to the print server which says, “If you issue this http request (4), you will receive a stream of Wikibook XML (5); convert it to LATEX and PDF, then give me back the result (8).” The wiki administrator has configured the wiki server so that (4 and 5), “If you receive an http request (4)in this format, forget all you know about wiki to HTML and do wiki to XML instead (5).” The wiki server thus needs to give the reader a form (2) in order to (3), “Tell the print server (at this address) to issue this http request.” Finally, the print server also needs to retrieve supplementary materials, such as image files, referenced in the XML (6 and 7), and return a print document (8).

The benefit of this architecture is that the wiki server doesn’t need to know how to print (but does need to know where to get a printing service) and the print server doesn’t need to know wiki markup.

Implementation

We chose LATEX as the typesetting system after eliminating the other candidate: XSL-FO. Formatting Objects is a markup language for XML document formatting which is most often used to generate PDFs. The FO vocabulary is part of XSL — a set of W3C technologies designed for the transformation and formatting of XML data. At the time we looked (December 2004), we could not find any books on FO that had been published using FO. On the other hand, all the books on LATEX were published using LATEX. It seemed to us that choosing FO would introduce unnecessary risk and that the risk would make its presence felt as unexpected costs or worse, insurmountable implementation problems. On the other hand, LATEX appeared to do everything we could think of. Resources such as The LATEX Companion [10] were evidence of the world of possibilities open to us.

In determining how and when to convert wiki markup to LATEX, the following questions needed to be considered:

  1. Do we work from wiki markup or from the HTML generated from wiki markup?
  2. Do we translate markup directly into LATEX or into an intermediate form first?

Implementation shows the pipeline tool suite approach adopted for Wikipublisher. Wiki markup is translated into an intermediate print-oriented XML form, and then transformed into LATEX. The reasons were largely pragmatic — we built on top of things that already worked. The tbook system [2] is a free software project for converting XML documents into LATEX using XSLT, so if we could convert wiki markup into XML, we could use tbook to typeset it. The PmWiki project [9] is a markup agnostic wiki engine (almost), which lets a site administrator redefine or augment the markup translation rules.

Implementation
Wikipublisher Implementation Pipeline Tool Suite

We wrote a plug-in for PmWiki [15] (written in PHP) that replaces all the wiki to HTML translation rules with wiki to XML rules. We quickly found that the wiki markup had rules for which there were no equivalents in the tbook DTD and hence no XML to LATEX translations. We added a range of extensions to the tbook DTD, style files and XSLT, and called the resulting XML to LATEX conversion service Wikibook and Wikibook DTD. The plug-in also provides a “print metadata manager” which lets authors and readers customise the way the print output is presented, by passing configuration parameters to the Wikibook PDF server. Further information about the details of the XML to LATEX rules can be found elsewhere [14].

The XML generated from wiki markup needs to be a valid Wikibook document. This means the design needs to provide a way of removing any HTML litter, for example, from markup rules that the PmWiki plug-in does not support. We do this by adding a tbook namespace qualifier to all the wiki to XML translation rules. This means any unqualified tags in the output must be residual HTML, so we remove these during a post-evaluation stage, then remove the tbook namespace qualifier to produce a document in Wikibook XML.

Transformation

We have made XML generation and Wikibook transformation as robust as possible. What this means is that consistent presentation of printed outputs is completely automatic — not just within a document type (all reports have the same look), but different document types are all recognisably part of the same family. So businesses such as professional services firms, which typically produce a large number of documents of a small number of document types, can get a consistent look (a house style) at minimal cost and in particular with less quality control effort. There is a huge quality advantage when we shift typesetting from the desk-top to the server, because we eliminate local stylistic variations. Of course, limiting local customisation can also be a disadvantage in many situations.

In turning wiki input into Wikibook XML output, we start by establishing a document hierarchy (chapter, section, subsection, and so on). We use lists of page names to publish a collection and let the list level define each page’s place in the output hierarchy. Then we analyse the headings on each page and slot these into the next available level. Suppose pages A and B are both chapters in a book. The author of page A might have used heading 1 and heading 3 markup, while the author of page B used heading 2 and heading 4 markup. Wikipublisher will map both pages’ headings into sections and subsections. Of course, page B could also be defined as a section in a different book; in this case, its headings become subsections and subsubsections. Page A could be published separately as an article; in this case, its headings form the sections and subsections.

As well as transforming wiki markup to print equivalents, we need to allow for differences between screen and paper. If a page contains a low resolution “thumbnail” image, linked to a high resolution version, Wikipublisher will automatically use the high resolution version, and shrink it to fit the space. Wikipublisher will automatically work out the width of table columns based on the paper size, it will split long tables across pages, and can rotate wide tables and images. Wikipublisher also offers support for good typesetting practice such as markup for setting abbreviations in small caps (e.g., XML instead of XML) and adding marginal notes.

Links in HTML are all represented using the <a> tag. Wikipublisher distinguishes various semantically distinct types of link, each of which has a different print representation:

  • links to wiki pages on the same web site
  • links to external web pages
  • cross-references to another section of the same document
  • footnotes
  • references to citations
  • references to equations and “floats” (figures, tables and floating blocks)

References to citations are an essential part of scholarly writing and Wikipublisher supports these. LATEX is very good at processing bibliographic data, for example by using the BIBTEX package. However, the wiki has to generate all the references and citations fully formed, linked, and sorted for display in HTML; hence this information is also embedded in the Wikibook XML. So instead of exporting its bibliographic data to BIBTEX, the Wikibook PDF server has to tell LATEX how to handle bibliographies and just print what Wikipublisher gives it. Wikipublisher also ensures that if hyperref’s colorlinks option is turned on [13], references to citations are presented in the right colour.1

Citation references work the wiki way, using a special cite markup. If the citation exists, the reference links to it; if not, the author sees an Edit link which, when clicked, opens an edit form to define the citation elements. Writers can choose between numerical and Author–Year reference styles and can list any citations referenced but undefined, or defined but unreferenced, at the end of the page. Wikipublisher’s reference markup includes various options to control the display of the text linking the reference to the citation, similar to the functionality provided in the LATEX natbib package. For example, the markup to show author names is citeT, may include a text prefix or suffix, and produces links such as see Rahtz and Oberdiek.

Experience

In this section we describe the growth in our understanding of people’s expectations of an online typesetting service and the conventions and constraints the print medium imposes. We outline our experience in using Wikipublisher to bring print culture to web content, use of the Wikipublisher system in various contexts, and the lessons we have learnt.

Bring Print Culture to Web Content

When converting web pages to print, people expect the typesetting engine to apply standard conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. This means authors can focus on content, rather than presentation. It also means authors do not need to be typesetting experts to produce professional-looking printed documents from their web page collections. The following are among the more common conventions we adopt:

  • captions are placed above tables and below figures
  • images and tables “float” to the top of the next page if there is insufficient room; the text following flows back around the floated object
  • captions for images floated left or right on the Web are on the right of the image on recto pages and the left on verso pages

Readers of print documents expect that these will follow the layout conventions developed and refined over the centuries since the invention of the printing press. It is not enough just to take the content and wrap it in a print-oriented layout template; there are other, more subtle conventions that we need to apply to the content itself. Examples of printing press conventions are the treatment of quote marks, hyphens and separators:

  • smart quotes — treat straight quote marks as markup characters and “smarten” them into the equivalent HTML entities (including exceptions such as ’phone and prime marks such as 6′ 30″)
  • smart hyphens — recognise em dash and en dash markup — such as 1–10 and Wellington–Picton — and translate these into the equivalent entities
  • smart separator — a horizontal rule offers an opportunity for a small typographic flourish, as the (configurable) separator below shows

As users have placed more demands on the system, new layout problems have emerged. We found we had to adopt an evolutionary approach to refining the software. The stimulus for change was generally looking at samples of real-world printed output and saying, “Oh … That’s no good.” The following are some examples.

Links to external Web pages often have long URLs, typically resulting in spurious white space in justified text. We automatically make all punctuation characters discretionary (i.e., we treat punctuation in URLs as wiki markup), thereby allowing a long URL to split over 2 or more lines if necessary. To avoid possible ambiguity, we put the punctuation character after the line break. We print the link text in the body and the URL in a footnote or citation.

Writers also no longer need to know that the convention for use of footnotes is that the footnote reference character appears after a punctuation mark such as a comma or a full stop. Wikipublisher just makes sure it happens. If a reader chooses to publish a page using vertical spaces between paragraphs (rather than indents), Wikipublisher will automatically change the formatting of footnote text to hang the number into the left margin and suppress the first line indent.

Printed layout has to be adaptable to the needs of the user and the constraints of the medium. The Wikibook DTD describes the structure of the page’s content, not its presentation. Presentation is left to the typesetting engine and is controlled, to a degree, through print <meta> tags and their attributes. These allow authors and readers to set printing preferences such as the paper size, the layout of a book cover, or turn off duplex. This approach generally works well, but it has some limitations, many arising from the physical differences between a scrolling landscape colour screen and a fixed-size portrait sheet of paper.

Tabular material continues to present a huge challenge. True tables are supported well. Headings, captions, cell alignment, text wrap and row or column span all map to their print equivalents. Wikipublisher will detect running heads in long tables and can rotate wide tables, which may display well on screen but can look cramped on paper. Where tables are used as a way to control the presentation of complex material, it currently works less well, especially with very long or wide cells. Wikipublisher treats complex cells as “minipages” and keeps the content together on a page. As a result, if a cell contains more than a page of content, some will fall off the end of the page and complex cells are not allowed to span multiple columns.

Style information is only partially supported. There are major differences between the CSS-based model used in XHTML (and implicit in PmWiki) and the structure-based approach used in LATEX. Currently, there is no print equivalent to the rich style options found in XHTML. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with a pale blue background and 5px padding).

Division style attributes have limited support. The Wikibook DTD allows multiple paragraph or other block tags to be wrapped in a <group> tag (equivalent to the HTML <div> tag), which is treated as a minipage. Just as a table with a caption will “float” so a div block with a caption will also float. This area is under continued development. At the moment, group styles are treated on a case-by-case basis. Significant additional work will be needed to create a mechanism for translating all possible div CSS properties to their typesetting equivalents.

 

1 We could not find any references for this and took a year to work out how to do it. (↑)

Experience Within Affinity

Affinity Limited has migrated all its formal client communications to Wikipublisher. Each Affinity project gives clients access to a password-protected wiki page group on Affinity’s in-house web server (running Mac OS X). This web server also hosts an instance of the Wikibook PDF server, so all content published on the wiki can also be published as a PDF for printing. This live practice across a variety of different projects has stimulated a series of progressive improvements in the Wikibook PDF server software. In this environment, reliability is critical — software failures cannot be tolerated, as these would reflect poorly on Affinity’s ability to deliver a quality service.

Affinity achieved its original purpose of removing the Adobe FrameMaker from the publishing pipeline and now generates all documents using Wikipublisher, first as links to wiki pages, followed where appropriate with printed versions. This includes consultant résumés, proposals, letters of agreement, terms of reference, interview write-ups, progress reports and all project documentation. This is a new and unfamiliar paradigm — most clients still show up to meetings with printed web pages, even though a high quality PDF is only one extra click away. The only time Affinity uses a word processor is when a client sends it a word processing document.

Experience Within Victoria University of Wellington

In 2002 the Elvis Software Design Research Group in the School of Mathematics, Statistics, and Computer Science at Victoria University were involved with the early development of the Wikipublisher project. We initially used the pre-alpha version of the software called PDF 2 YOU with our wiki (ElvisBrain) which was UseMod at the time. The PDF 2 YOU software was originally just the print PDF server which was a Perl script that worked off the wiki’s HTML output and translated this into LATEX. We used the software with our wiki for aggregating and repurposing the content for offline discussions about research projects (e.g. The Lego Hypothesis) and administrative procedures (e.g. how we should conduct our meetings and research in general).

One of ongoing issues for the school is to provide the same content efficiently and accurately for both course and degree prospectuses and its web sites. We installed the PDF 2 YOU software for use with the school’s web site. The school web site was comprised of plain HTML and a number of Perl scripts to generate dynamic content. Making the PDF server work with the school’s web site was challenging to begin with and required a number of configuration changes to the school’s web servers.

In 2004 the school changed the web site to using different software and the PDF 2 YOU server was no longer being maintained since the direction of the Wikipublisher project had changed to being used with wiki markup rather than HTML. In 2007 the school web site changed yet again and this time to using Twiki as the main content wiki management system. Our research group also changed to Twiki shortly there after. If a Wikipublisher plug-in for Twiki becomes available we will install it.

Public Wikibook PDF Service

Affinity runs a free public Wikibook PDF server for those wishing to try out the software. In the past 5 years we have had 340 wiki sites registered to use the Wikipublisher system via the public server. This has been a fruitful source of feedback for the system’s evolution, in response to others’ experiences. The web site includes an issues register for people to log bugs and change requests, a tip of the week section where we publish short “how to” stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) that extend Wikipublisher’s capabilities. There is also a discussion group, which tends to be the first point of contact for people encountering problems or seeking new features. Currently, the discussion group has 36 members.

One of the early adopters of the Wikipublisher service was Refractions Research, a company in Victoria, British Columbia that builds data systems to add geographic-intelligence to business processes. They use Wikipublisher for all their technical documentation and currently have over a dozen separately managed documentation sites, using PmWiki’s authentication system to give access to these sites to their customers over the Web. This requires use of a special access right in the wiki software to detect publishing requests from the Wikibook PDF server’s IP address and passing these through the security system. This is what they told us after they set up their first Wikipublisher-based system:

This is amazing. You guys have quite possibly revolutionized the usage of a wiki and what it means to collaboratively create and maintain documents. Major, major kudos to you. — Krista Stellar

PDF Server for Download

We have had a total of 680 downloads of the open source PDF Server, on average about 15 per month, for the past 3½ years. Most users have found the Wikipublisher system from the web page [15] that describes the PmWiki plug-in. One of our users has said the following about their experience at installing the PDF server:

May I say that I’m very impressed with Wikipublisher, especially your instructions on getting the server-side up and running — that was easy. … It seems to have the edge on other variants with its intelligent mining of information (e.g. Wiki trail) and the high quality visualisation enhancing readability. — Steve Crisp

Given that academic institutions have limited budgets they need to manage their expenses very carefully and can ill afford expensive collaborative authoring and publishing software. We have found that a number of universities have decided to use Wikipublisher. For example, Buffalo State College has set up a pilot of PmWiki/Wikipublisher for a collaborative writing project:

The print output is a very attractive feature, and PmWiki’s great flexibility make it a likely winner. Good work. — Kevin Hayes

Wikipublisher’s support for equation markup by providing a PmWiki plug-in for the LatexRender library has led to its adoption for a Swedish mathematics portal:

I am really impressed!!! This is exactly what we need. We are now using Wikipublisher at our math-department in Sweden. — Samuel Bengmark

Lessons Learnt

The better Wikipublisher does its job, the less people notice it; good typography is invisible, letting the reader focus on reading. We have learnt several important lessons in creating the Wikipublisher project [14]:

Users expect zero errors.

One of the advantages of using a wiki as a front-end to a typesetting engine is that people assume “it will just work”. When the Wikipublisher project released its software into the wild, it quickly became clear that there is a universal expectation of zero errors. Web browsers are tolerant and this sets an expectation that if a reader requests a PDF, then the Wikibook server will deliver a PDF every single time. As far as Wikipublisher’s users are concerned, if they do not get a PDF when they ask for one, this is a software bug and the Wikipublisher project needs to fix it.

On the other hand, when we use a word processor, we generally spend a lot of time tweaking to get the presentation right. Even using LATEX, its unforgiving nature teaches authors to expect to find and fix their own errors. The Wikipublisher project set out to bring wiki-like reliability to print and thus to produce PDF documents every time. This means following acceptability-oriented computing practices [16] — if it detects errors in the output, it must continue to function and allow authors to rectify the failures.

Publishing online first alters the rules.

Like most other project-based consultancy practices, Affinity writes regular progress reports to its clients. These used to be paper letters; now they are wiki pages which can be typeset as letters. The shift from “documents” to “pages” creates a new perspective on the content. For example: pages can contain links to related materials such as project documentation; the client can annotate them with comments; the entire history of the project is instantly accessible and searchable; and print snapshots can be taken to store in a document management system or bind for physical distribution.

Open beats closed.

Wikipublisher exists because its developers could stand on the shoulders of free software giants. Using free and open source software reduces the barriers to entry. Open standards, especially XML, enable interoperability between disparate systems like PmWiki and LATEX. Open means that when we strike a problem, search engines such as Google will find people who have seen the problem before and left trails on the Web for us to follow. The wiki markup and file format specifications are open and fully documented, so long term content curation and preservation are simply not a problem.

Wiki markup is hard to learn.

The biggest surprise has been the implacable resistance to wiki markup from so many otherwise sensible people. For many, “It’s not WYSIWYG” is a show-stopper. For busy people, having to learn something new is a significant barrier to entry. Absence of features such as automated spell-checking further increase resistance. Fortunately, there are enough people who are over this that we no longer bother with the objections. If people do not wish to use Wikipublisher because it requires them to learn something new, then they can use alternative solutions such as those we outline in Related Work. However, anyone writing wiki content ought to use their web browser’s built-in spell checker.

Most people are indifferent.

It is a minority of people who share our passion for what Wikipublisher can do. People who are familiar with HTML often do not see the point of transforming web content into LATEX to create a high quality print document. They see print as a disposable afterthought and consider that generating a printable page view using CSS is good enough. We find it surprising that people who pay careful attention to accessibility and readability principles for web sites are happy to ignore these for printed material. Most authors are comfortable with their word processor and see no reason to change their practice. If what they are writing will be published on a web site, converting the document to HTML is someone else’s problem. When their document is finished, they toss the dead sheep over the fence into the next paddock and forget about it.

Related Work

We are aware of the following different approaches for printing web pages: CSS support, built-in web browser support, extensions to web browsers, web specific printing solutions, and extensions to wiki engines.

Many web sites provide printing support using CSS styles. The web pages contain options where users can select the page to be rendered in HTML using a specific print CSS style or @media:print qualifiers. The design of most of these CSS styles strip out headers, borders, and menus on the web page to leave just the content of the page. There is however usually no option for the user to modify the styles, select output options such as fonts or file format, or bundle collections of pages, let alone generate navigation aids such as a table of contents.

Some modern browsers such as Apple Safari and Mozilla Firefox provide better built-in options than most web sites for printing web pages. Again most of these browsers strip out headers, borders, and menus to provide a PDF or Postscript page. However as with CSS print support, the browsers give limited options for a user to customise the output and none allows dynamic selection of collections of pages.

Specific extensions to web browsers to support customisation and printing have been developed. Aardvark [3] is an extension to Mozilla Firefox that allows a user to configure the styles on a web page, such as removing headers or borders, widening tables, increasing font sizes, or changing the colour of text or backgrounds. While there is no option for generating high quality print documents, it gives the user control over what content to print if using a built-in web browser option for printing. PrintMonkey [1] is a prototype solution that is an extension to Firefox implemented in JavaScript. It uses the Greasemonkey engine to allow user scripts to get data from any URL and has templates to make it easier to print web content. However, none of these solutions uses a typesetting engine nor are they designed for printing long complex documents.

There are many web specific solutions, two examples of which we will now discuss. Prince XML is a high quality print system that converts XHTML, XML, and SVG documents into PDF documents using CSS. The system works from the command line and URLs can be used as input. The XHTML 2 PDF service is a web based equivalent. However, neither of these systems uses LATEX for their typsetting engine, are not integrated with any wiki engines, and are commercial services.

A number of print extensions exist for various wiki engines including MediaWiki and Twiki. PediaPress [12] (the most similar to Wikipublisher) has written an extension to MediaWiki which parses the wiki markup using an open source Python library to typeset PDF documents based on a DOM. Collections of articles can also be rendered together to form a book and with the ability to do some ordering. Other formats such as ODT and DocBook XML can be exported, and finally a printed book can be ordered from the company. Other extensions use HTMLDOC, an open source and commercial package that turns HTML pages into PDF documents [4] [6], and another extension [5] uses a similar HTML to PDF tool DOMPDF. Finally, one extension converts MediaWiki wiki markup to LATEX using PHP [7], although this work is in its early stages. However, with all of these solutions the end-user has fewer options than with Wikipublisher on the output of the final document, modifications to the PDF generation engine are hard to add, the actual typeset source is unobtainable easily, and none of them is generic enough to be ported to other wiki engines.

In comparison with these other approaches, Wikipublisher has three unique features:

  • its use of an intermediate XML form decouples the print server from the web server
  • its print metadata model allows authors and readers to set their layout preferences
  • its support for citations makes it suitable for collaborative online scholarly authoring

Future Work

The Wikipublisher typesetting system is functionally complete for our own purposes, but offers several fruitful avenues for further development. The Wikipublisher project sees four areas where future work is desirable, if there is sufficient customer demand. They all focus on growing the community.

Wikibook on Microsoft Windows

To date, people are running the Wikibook PDF server on various flavours of GNU/Linux plus Mac OS X. The project regularly receives e-mails from people who want to know whether it will run on Windows. We reply that as far as we know, all the software they will need is available for Windows, but we do not know of any Windows installations. We offer to help them work through any issues they may encounter and to document the Windows installation process. We are yet to hear back from one of these emails.

Our hope is that one day soon, somebody will value the software enough to give back to the project a documented Windows installation process. As far as we know, this is not hard to do. Meanwhile, running the software on a Windows server will probably continue to be the most requested feature.

Extensions to other content management engines

There are limitations to the adoption of Wikipublisher. Since it was possible to write a plug-in for PmWiki to output Wikibook XML, it should also be feasible to add the same API capability to other wikis, blogging engines and web content management systems which support third-party plug-ins. Again, the project has received several enquiries, in particular about support for MediaWiki, but to date none has turned into a real project. Given the scale of the undertaking and in the absence of a customer willing to provide time or money, we have been reluctant to embark on this.

Such plug-ins need to add four pieces of functionality:

  • translate the markup into suitable Wikibook XML
  • provide wrappers, including configurable print metadata, for the various document types
  • assemble collections of Web pages into a single document
  • present the reader with an interface to the Wikibook PDF server

Wikipublisher’s restriction to PmWiki is perhaps the greatest barrier to its wider adoption. The increasing availability of “open content” APIs, such as the Guardian Open Platform, also creates opportunities for services that sample and remix content to generate “print runs of one” tailored to the interests of individual readers.

User-specified LATEX classes

In an ideal world, an author could instruct the Wikibook PDF server to typeset their content using any valid LATEX class file (as long as it is reachable with an http request). The current Wikibook DTD defines four distinct document types: letter, article, report and book. The wiki plug-in makes sure the wiki produces Wikibook XML that complies with the requested DTD. To support user-defined classes, Wikipublisher would have to make sure that the document type used is compatible with the specified class. As an example of something that would in all likelihood go wrong, an author may request wiki pages printed as a set of presentation slides.

Much careful design work is needed to ensure a robust solution. Fortunately, the current system is sufficiently flexible that few people have asked for additional document classes. On the other hand, it would have been really useful to load the correct ACM template for this paper! As it was, the authors exported the raw LATEX as an article and manually converted this to use a different class.

Use of Wikipublisher

To inform further development of a system to publish documents using a wiki and provide high quality print output, we would like to conduct an empirical study of how people are using Wikipublisher. We would like to explore the following research question with the current user base: “What has been your experience using Wikipublisher?” We envisage setting up an online survey form (on Wikipublisher) and gathering data from a self-selecting sample of users. The survey would explore the kind of content, motivations for adopting Wikipublisher, benefits they have gained, issues they have encountered, and their plans for the future.

Conclusions

In producing print documents, most people are accustomed to making a trade-off between the convenience of a word processor and the quality of a desk-top publishing system. Most choose convenience, with the unfortunate result that typographic mediocrity has become entrenched in our culture. One of the big reasons for the popularity of wikis is their convenience. Wikipublisher lets us combine the convenience of a wiki with the typesetting quality of the finest desk-top publishing software. Because the system embeds good typesetting practices in the software, the quality comes free. The Wikipublisher system allows authors to deliver fast, cheap, and good.

In this paper, we have shown that you no longer need a word processor to author and publish letters, articles, reports, and books. Publishing online first, with support for offline reading, is affordable and practical. We have shown how Wikipublisher adds value on top of LATEX by ensuring consistent and correct command usage and simplifying layout changes. We have described:

  1. an architecture for a typesetting system to print Web content expressed using wiki markup
  2. an implementation of the architecture for one wiki engine, PmWiki, to convert wiki markup into XML
  3. extensions to the tbook system to support richer data structures, print preferences, and typesetting as a Web-based service
  4. our experiences developing and using the Wikipublisher system

Over the last few years, we and our users have learnt many things about document layout practice and built our knowledge into the Wikipublisher software. By making typesetting a shared web-based service, this knowledge automatically becomes available to everyone who uses the Wikipublisher system. This cannot happen when typesetting is a personal computer-based service. Even if, for example, we improve a document template, we have to distribute it to everyone and it does nothing for our existing document collections. On the other hand, because Wikipublisher forces a separation between content in wiki markup and its presentation either for the Web or print, improving the print presentation automatically upgrades the layout of all content, the next time any page is printed.

Acknowledgements

Donald Gordon designed and wrote the Perl script that drives the PDF server and taught the XSL script many new LATEX tricks. He did the back-end integration that means we can give the PDF server any URL that returns a stream of Wikibook XML and get back a typeset PDF suitable for printing. He also wrote the PmWiki handlers to translate wiki tables and wiki styles into useful XML. Darren Willis wrote the server-side components that teach LATEX how to process the bibliography XML created from wiki citation markup.

Removed Content

The expectations from 2.3 of D7

The Wikipublisher project set a number of design goals (in the form of constraints) we wished to achieve:

  1. the quality must be at least as good as that produced from a desk-top publishing package
  2. an author must be able to use regular wiki markup and let the publishing engine interpret it for the Web or print with no user intervention
  3. a reader must be able to generate a print version of any page collection, as well as any individual page 

Removed from Wiki vs WYSIWYG of D7

Longer term, we expect that wiki front-end editors will emerge, in the same way that tools like LyX are available for LATEX. In the mean time, anyone writing wiki content ought to use a web browser with spell checking built in.

References for Wikipublisher: A Print-on-Demand Wiki

[1]   (edit)J. Baldwin, J.A. Rowson, and Y. Coady, 2008. PrintMonkey: giving users a grip on printing the Web, In Proceedings of Document Engineering (DocEng) 2008. ACM, 230–239.

[2]   (edit)T. Bronger, 2003. The tbook system for XML Authoring, tbookdtd.sourceforge.net, accessed on 27 March 2009.

[3]   (edit)R. Brown, 2009 : Aardvark Mozilla Firefox Extension, karmatics.com/aardvark, accessed on 27 March 2009.

[4]   (edit)A. Dunkley, 2008. MediaWiki Extension: PDF Book, www.mediawiki.org/wiki/Extension:Pdf_Book, accessed on 27 March 2009.

[5]   (edit)A. Hagmann, 2009. MediaWiki Extension: PDF Export Dompdf, www.mediawiki.org/wiki/Extension:Pdf_Export_Dompdf, accessed on 27 March 2009.

[6]   (edit)T. Hempel, 2008. MediaWiki Extension: PDF Export, www.mediawiki.org/wiki/Extension:Pdf_Export, accessed on 27 March 2009.

[7]   (edit)H.-G. Kluge, 2008. MediaWiki Extension: Wiki2LaTeX, www.mediawiki.org/wiki/Extension:Wiki2LaTeX, accessed on 27 March 2009.

[8]   (edit)L. Lamport, 1994. LATEX: A Document Preparation System (Second Edition). Boston: Addison-Wesley.

[9]   (edit)P. Michaud, 2002. PmWiki: a wiki-based system for collaborative creation and maintenance of websites, www.pmwiki.org, accessed on 27 March 2009.

[10]   (edit)F. Mittelbach and M. Goossens, 2004. The LATEX Companion, Second Edition. Boston: Addison-Wesley.

[11]   (edit)D. Nicholas, P. Huntington, H.R. Jamali, I. Rowlands, T. Dobrowolski, and C. Tenopir. Viewing and reading behaviour in a vitual environment: The full-text download and what can be read into it. Aslib Proc. 60(3):185–198, 2008.

[12]   (edit)PediaPress, 2009. MediaWiki Extension: Collection, www.mediawiki.org/wiki/Extension:Collection, accessed on 27 March 2009.

[13]   (edit)S. Rahtz and H. Oberdiek, 2008. Hypertext marks in LATEX: a manual for hyperref, www.tug.org/applications/hyperref/manual.html, accessed on 27 March 2009.

[14]   (edit)J. Rankin. Wikipublisher: A Web-based System to Make Online and Print Versions of the Same Content, TUGboat 29(2): 264–269, 2008.

[15]   (edit)J. Rankin, 2009. PmWiki Plug-in: PublishPDF, www.pmwiki.org/wiki/Cookbook/PublishPDF, accessed on 27 March 2009.

[16]   (edit)M. Rinard, 2003. Acceptability-oriented computing, In Companion of OOPSLA 2003. ACM, 221–239.

[17]   (edit)S. Rosenberg, 2007. Dreaming in code: two dozen programmers, three years, 4732 bugs, and one quest for transcendent software. New York: Crown Publishers.

[18]   (edit)Smashing Magazine, 2007. Printing the Web: Solutions and Techniques, www.smashingmagazine.com/2007/02/21/printing-the-web-solutions-and-techniques, accessed on 27 March 2009.

Creative Commons License
Edit · History · Print · Recent Changes · Search · Links
Page last modified on 07 July 2009 at 12:30 PM