Recent Changes
Recent Changes · Search:

WikiSym.WikipublisherProjectPoster2 History

Hide minor edits - Show changes to output

14 June 2009 at 08:43 PM by John Rankin - draft poster
Changed line 10 from:
(:bib fmt=num page=WikipublisherProject:)
to:
(:bib fmt=num page=WikipublisherProject title=References :)
14 June 2009 at 06:45 PM by John Rankin - draft poster
Changed line 105 from:
The better Wikipublisher does its job, the less people notice it; good typography is invisible, letting the reader focus on reading. In producing print documents, most people are accustomed to making a trade-off between the convenience of a word processor and the quality of a desk-top publishing system. Most choose convenience, with the unfortunate result that typographic mediocrity has become entrenched in our culture. One of the big reasons for the popularity of wikis is their convenience. Wikipublisher lets us combine the convenience of a wiki with the typesetting quality of the finest desk-top publishing software. Because the system embeds good typesetting practices in the software, the quality comes free.
to:
The better Wikipublisher does its job, the less people notice it; good typography is invisible, letting the reader focus on reading. In producing print documents, most people are accustomed to making a trade-off between the convenience of a word processor and the quality of a desk-top publishing system. Most choose convenience, with the unfortunate result that typographic mediocrity has become entrenched in our culture. A big reason for the popularity of wikis is their convenience. Wikipublisher lets us combine the convenience of a wiki with the typesetting quality of the finest desk-top publishing software. Because the system embeds good typesetting practices in the software, the quality comes free.
14 June 2009 at 06:42 PM by John Rankin - draft poster
Changed line 73 from:
Style information is only partially supported. There are major differences between the `CSS-based model used in `HTML and the structure-based approach used in Latex. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with 5px padding).
to:
Style information is only partially supported. There are major differences between the `CSS-based model used in `HTML and the structure-based approach used in Latex. The approach taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as put a red dotted border around this paragraph with 5px padding).
14 June 2009 at 06:38 PM by John Rankin - draft poster
Changed lines 60-62 from:
When converting web pages to print, people expect the typesetting engine to apply standard conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. Authors focus on content, rather than presentation.This means authors do not need to be typesetting experts to produce professional-looking printed documents from their web page collections.

Readers of print documents expect that these will follow the layout conventions developed and refined over the centuries since the invention of the printing press. Examples include the treatment of quote marks, hyphens and separators:
to:
When converting web pages to print, people expect the typesetting engine to apply standard layout conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. Authors focus on content, rather than presentation.This means authors do not need to be typesetting experts to produce professional-looking print documents from their web page collections.

Examples of print layout conventions include the treatment of quote marks, hyphens and separators:
14 June 2009 at 06:32 PM by John Rankin - draft poster
Deleted line 62:
Deleted line 63:
Deleted line 64:
14 June 2009 at 06:31 PM by John Rankin - draft poster
Changed line 82 from:
The web site includes an issues register for people to log bugs and change requests, a tip of the week section where we publish short "how to" stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) that extend Wikipublisher's capabilities. There is also a discussion group, which is often the first point of contact for people encountering problems or seeking new features. Currently, the discussion group has 40 members.
to:
The web site has an issues register for people to log bugs or change requests, a tip of the week where we publish short "how to" stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) to extend Wikipublisher's capabilities. There is also a discussion group, which is often the first point of contact for people encountering problems or seeking new features.
14 June 2009 at 06:27 PM by John Rankin - draft poster
Changed lines 7-9 from:
(:typeset-page title="{$Title}" subtitle="John Rankin(nl)Affinity Limited(nl)Wellington, New Zealand(nl)Email: john.rankin@affinity.co.nz(and)Craig Anslow, James Noble,(nl)Brenda Chawner, Donald Gordon(nl)Victoria University of Wellington(nl)Wellington, New Zealand" surtitle="WikiSym 2009" urlstyle=on colorlinks=on autonumber=2 fontsize=2col ucsection=on :)
to:
(:typeset-page title="{$Title}" subtitle="John Rankin(nl)Affinity Limited(nl)Wellington, New Zealand(nl)Email: john.rankin@affinity.co.nz(and)Craig Anslow, James Noble,(nl)Brenda Chawner, Donald Gordon(nl)Victoria University of Wellington(nl)Wellington, New Zealand" urlstyle=on colorlinks=on autonumber=2 fontsize=2col ucsection=on :)
Changed lines 60-63 from:
When converting web pages to print, people expect the typesetting engine to apply standard conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. This means authors can focus on content, rather than presentation. It also means authors do not need to be typesetting experts to produce professional-looking printed documents from their web page collections.

Readers of print documents expect that these will follow the layout conventions developed and refined over
the centuries since the invention of the printing press. It is not enough just to take the content and wrap it in a print-oriented layout template; there are other, more subtle conventions that we need to apply to the content itself. Examples of printing press conventions are the treatment of quote marks, hyphens and separators:
to:
When converting web pages to print, people expect the typesetting engine to apply standard conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. Authors focus on content, rather than presentation.This means authors do not need to be typesetting experts to produce professional-looking printed documents from their web page collections.

Readers of print documents expect that these will follow the layout conventions developed and refined over the centuries since
the invention of the printing press. Examples include the treatment of quote marks, hyphens and separators:
Deleted lines 75-76:
Printed layout has to be adaptable to the needs of the user and the constraints of the medium. The Wikibook `DTD describes the structure of the page's content, not its presentation. Presentation is left to the typesetting engine and is controlled, to a degree, through print <meta> tags and their attributes. These allow authors and readers to set printing preferences such as the paper size, the layout of a book cover, or turn off duplex.
Changed lines 80-84 from:
We run a free public Wikibook `PDF server for those wishing to try out the software. In the past 5 years we have had 340 wiki sites registered to use the Wikipublisher system via the public server. This has been a fruitful source of feedback for the system's evolution, in response to others' experiences. The web site includes an issues register for people to log bugs and change requests, a tip of the week section where we publish short "how to" stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) that extend Wikipublisher's capabilities. There is also a discussion group, which is often the first point of contact for people encountering problems or seeking new features. Currently, the discussion group has 40 members.

We have had a total of 680 downloads of the open source `PDF Server, on average about 15 per month, for the past 3`1/2 years. Most users have found the Wikipublisher system from the web page cite(Rankin:2009) that describes the PmWiki plug-in.
to:
We run a free public Wikibook `PDF server for those wishing to try out the software. In the past 5 years we have had 340 wiki sites registered to use the Wikipublisher system via the public server. This has been a fruitful source of feedback for the system's evolution, in response to others' experiences.

The web site includes an issues register for people to log bugs and change requests, a tip of the week section where we publish short "how to" stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) that extend Wikipublisher's capabilities. There is also a discussion group, which is often the first point of contact for people encountering problems or seeking new features. Currently, the discussion group has 40 members.
Changed lines 92-93 from:
!!!Extensions to other content management engines
to:
!!!Extension to other content engines
Deleted lines 108-109:

Over the last few years, we have learnt a lot about document layout practice and built our knowledge into the Wikipublisher software. By making typesetting a shared web-based service, this knowledge automatically becomes available to everyone who uses the Wikipublisher system. This cannot happen when typesetting is a personal computer-based service. Even if, for example, we improve a document template, we have to distribute it to everyone and it does nothing for our existing document collections. On the other hand, because Wikipublisher forces a separation between content in wiki markup and its presentation either for the Web or print, improving the print presentation automatically upgrades the layout of all content, the next time any page is printed.
14 June 2009 at 06:07 PM by John Rankin - draft poster
Changed lines 56-57 from:
We have made `XML generation and Wikibook transformation as robust as possible. What this means is that consistent presentation of printed outputs is completely automatic -- not just within a document type (all reports have the same look), but different document types are all recognisably part of the same family. So businesses such as professional services firms, which typically produce a large number of documents of a small number of document types, can get a consistent look (a house style) at minimal cost and in particular with less quality control effort. There is a huge quality advantage when we shift typesetting from the desk-top to the server, because we eliminate local stylistic variations. Of course, limiting local customisation can also be a disadvantage in many situations.
to:
We made `XML generation and Wikibook transformation as robust as possible. Consistent presentation of printed outputs is completely automatic -- not just within a document type (all reports have the same look), but different document types are all recognisably part of the same family. Businesses which typically produce a large number of documents of a small number of document types can get a consistent look (a house style) at minimal cost and in particular with less quality control effort. There is a huge quality advantage when we shift typesetting from the desk-top to the server, because we eliminate local stylistic variations. Of course, limiting local customisation can also be a disadvantage in many situations.
Changed lines 64-65 from:
* smart quotes -- treat straight quote marks as markup characters and "smarten" them into the equivalent `HTML entities (including exceptions such as `'phone and prime marks such as 6'' 30"")
to:
* smart quotes -- treat straight quotes as markup and "smarten" them into the equivalent `HTML entities, including exceptions such as `'phone and prime marks such as 6'' 30""
Changed line 78 from:
Style information is only partially supported. There are major differences between the `CSS-based model used in `HTML (and implicit in PmWiki) and the structure-based approach used in Latex. Currently, there is no print equivalent to the rich style options found in `HTML. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with 5px padding).
to:
Style information is only partially supported. There are major differences between the `CSS-based model used in `HTML and the structure-based approach used in Latex. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with 5px padding).
14 June 2009 at 06:00 PM by John Rankin - draft poster
Changed lines 42-43 from:
Authors interact with the wiki server as usual with a web browser (1 and 2). To create a print document, a reader submits a form (3) to the print server which says, "If you issue this http request (4), you will receive a stream of Wikibook `XML (5); convert it to Latex and `PDF, then give me back the result (8)." The wiki administrator has configured the wiki server so that (4 and 5), "If you receive an http request (4)in this format, forget all you know about wiki to `HTML and do wiki to `XML instead (5)." The wiki server thus needs to give the reader a form (2) in order to (3), "Tell the print server (at this address) to issue this http request." Finally, the print server also needs to retrieve supplementary materials, such as image files, referenced in the `XML (6 and 7), and return a print document (8).
to:
Authors interact with the wiki server with a web browser (1 and 2). To create a print document, a reader submits a form (3) to the print server which says, "If you issue this http request (4), you will receive a stream of Wikibook `XML (5); convert it to Latex and `PDF, then give me back the result (8)." The wiki administrator has configured the wiki server so that (4 and 5), "If you receive an http request (4) in this format, convert wiki to `XML instead (5) instead of `HTML." The wiki server thus needs to give the reader a form (2) in order to (3), "Tell the print server (at this address) to issue this http request." Finally, the print server needs to retrieve supplementary materials, such as image files, referenced in the `XML (6 and 7), and return a print document (8).
Changed lines 78-83 from:
>>frame<<
Style information is only partially supported. There are major differences between the `CSS-based model used in `XHTML (and implicit in PmWiki) and the structure-based approach used in Latex. Currently, there is no print equivalent to the rich style options found in `XHTML. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with 5px padding).
>><<

[^#^]
to:
Style information is only partially supported. There are major differences between the `CSS-based model used in `HTML (and implicit in PmWiki) and the structure-based approach used in Latex. Currently, there is no print equivalent to the rich style options found in `HTML. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with 5px padding).
Changed lines 89-90 from:
The Wikipublisher typesetting system is functionally complete for our own purposes, but offers several fruitful avenues for further development. The Wikipublisher project sees four areas where future work is desirable, if there is sufficient customer demand.
to:
The Wikipublisher system is functionally complete for our own purposes, but offers several avenues for further development. The project sees four areas where future work is desirable, if there is sufficient user demand.
Changed line 113 from:
Over the last few years, we and our users have learnt a lot about document layout practice and built our knowledge into the Wikipublisher software. By making typesetting a shared web-based service, this knowledge automatically becomes available to everyone who uses the Wikipublisher system. This cannot happen when typesetting is a personal computer-based service. Even if, for example, we improve a document template, we have to distribute it to everyone and it does nothing for our existing document collections. On the other hand, because Wikipublisher forces a separation between content in wiki markup and its presentation either for the Web or print, improving the print presentation automatically upgrades the layout of all content, the next time any page is printed.
to:
Over the last few years, we have learnt a lot about document layout practice and built our knowledge into the Wikipublisher software. By making typesetting a shared web-based service, this knowledge automatically becomes available to everyone who uses the Wikipublisher system. This cannot happen when typesetting is a personal computer-based service. Even if, for example, we improve a document template, we have to distribute it to everyone and it does nothing for our existing document collections. On the other hand, because Wikipublisher forces a separation between content in wiki markup and its presentation either for the Web or print, improving the print presentation automatically upgrades the layout of all content, the next time any page is printed.
14 June 2009 at 05:51 PM by John Rankin - draft poster
Changed lines 3-4 from:
(:description Web and print exist as two solitudes: printed web pages often disappoint, while converting print documents into good web pages is hard work. A wiki makes it easy for authors to create rich web content, but is little help if readers wish to print the results. Wikipublisher lets readers turn wiki pages or page collections into print with a couple of clicks, with a quality better than most word processing documents. This dramatically lowers the time and cost of creating online and print versions of the same content, with no loss of quality in either medium. :)
to:
(:description Web and print exist as two solitudes: printed web pages often disappoint and converting print documents into good web pages is hard. A wiki makes it easy for authors to create rich web content, but is little help if readers wish to print the results. Wikipublisher lets readers turn wiki pages or page collections into print, with a quality better than most word processing documents. This lowers the time and cost of creating online and print versions of the same content, with no loss of quality in either medium. :)
Changed lines 16-17 from:
Using a wiki is a great convenience for collaborative authoring, making it easy for authors to work together wherever they may be, as long as they have access to a web browser. This is fine if the result you want is web pages, but what if readers wish to print these? Most people reading more than a page of text will print it; a study of scholarly reading behaviour reports that 80% of researchers read scholarly articles on paper and only 20% read them online cite(Nicholas:etal:2008).
to:
Using a wiki makes it easy for authors to work together wherever they may be, as long as they have access to a web browser. This is fine if the result you want is web pages, but what if readers wish to print these? Most people reading more than a page of text will print it; a study of scholarly reading behaviour reports that 80% of researchers read scholarly articles on paper and only 20% read them online cite(Nicholas:etal:2008).
Changed lines 20-21 from:
Wikipublisher changes this, by giving wiki administrators a way to turn individual pages or page collections into a document suitable for printing. Wikipublisher passes transformed wiki content to the Latex document preparation system cite(Lamport:1994), to produce printed output of the highest quality -- superior to anything that can be achieved using {`CSS|cascading style sheets} or indeed with most word processors.
to:
Wikipublisher changes this, by giving wiki administrators a way to turn individual pages or page collections into a document suitable for printing. Wikipublisher passes transformed wiki content to the Latex document preparation system cite(Lamport:1994), to produce printed output of the highest quality -- superior to anything that can be achieved using {`CSS|cascading style sheets} or with most word processors.
Changed lines 30-35 from:
!Online First! The Web enhances our ability to communicate, so most of our work ought to appear online first. Creating content online first makes it instantly and widely accessible, and encourages linking to other online resources. Yet most of our authoring tools are "print first" and turning print documents into `HTML for publishing on the Web is hard to do well.

!Print Still Matters! If a web page is worth reading, it is worth printing. The longer and richer the content, the more likely the reader is to print it. We may skim read a 50 page report online, but to study it, we print it. Yet few web site designs appear to care what the printed form of the site looks like. Experience has taught us to have low expectations of printed web pages.

!One Authoritative Source! The most up-to-date content version is what appears on a web page; the typeset `PDF is a snapshot taken at a point in time. In contrast, most publishing systems require three or more versions: word processing or other source; a `PDF snapshot of the word processing source, and a collection of static web pages generated from the source.
to:
!Online First! The Web enhances our ability to communicate, so most of our work ought to appear online first. Creating content online first makes it instantly and widely accessible, and encourages linking to other online resources. Yet most of our authoring tools are "print first" and turning print documents into `HTML for the Web is hard to do well.

!Print Still Matters! A web page worth reading is worth printing. The longer and richer the content, the more likely the reader is to print it. We may skim read a 50 page report online, but to study it, we print it. Few web site designs appear to care what the printed form of the site looks like. Experience has taught us to have low expectations of printed web pages.

!One Authoritative Source! The most up-to-date content version appears on a web page; the typeset `PDF is a point in time snapshot. In contrast, most publishing systems require three or more versions: word processing or other source; a `PDF snapshot of the word processing source, and a collection of static web pages generated from the source.
Changed line 38 from:
Fig(fig.architecture) shows the architecture of Wikipublisher. The core architectural decision was to treat generating web pages and generating print pages as separate services. This means one print server can potentially support many web page servers -- printing is in most cases a low volume activity compared to browsing, so it is inappropriate to burden the web page server with print duties. We define a print {`API|application programming interface} that lets a web server expose its content in a way that the print server can process. As a result, the print server can work with any web content management system able to support the print `API. This design also promotes a more rigorous separation of the underlying content from its presentation in different media, making a wiki an ideal lightweight content server.
to:
Fig(fig.architecture) shows the structure of Wikipublisher. The core architectural decision was to treat generating web pages and generating print pages as separate services. This means one print server can potentially support many web page servers -- printing is in most cases a low volume activity compared to browsing, so it is inappropriate to burden the web page server with print duties. We define a print {`API|application programming interface} that lets a web server expose its content in a way that the print server can process. As a result, the print server can work with any web content management system able to support the print `API. This design also promotes a more rigorous separation of the underlying content from its presentation in different media, making a wiki an ideal lightweight content server.
14 June 2009 at 05:31 PM by John Rankin - draft poster
Added lines 1-117:
=<{$Description}

(:description Web and print exist as two solitudes: printed web pages often disappoint, while converting print documents into good web pages is hard work. A wiki makes it easy for authors to create rich web content, but is little help if readers wish to print the results. Wikipublisher lets readers turn wiki pages or page collections into print with a couple of clicks, with a quality better than most word processing documents. This dramatically lowers the time and cost of creating online and print versions of the same content, with no loss of quality in either medium. :)

(:title The Wikipublisher Project :)

(:typeset-page title="{$Title}" subtitle="John Rankin(nl)Affinity Limited(nl)Wellington, New Zealand(nl)Email: john.rankin@affinity.co.nz(and)Craig Anslow, James Noble,(nl)Brenda Chawner, Donald Gordon(nl)Victoria University of Wellington(nl)Wellington, New Zealand" surtitle="WikiSym 2009" urlstyle=on colorlinks=on autonumber=2 fontsize=2col ucsection=on :)


(:bib fmt=num page=WikipublisherProject:)

(:bibend:)

!![[#introduction]] Introduction

Using a wiki is a great convenience for collaborative authoring, making it easy for authors to work together wherever they may be, as long as they have access to a web browser. This is fine if the result you want is web pages, but what if readers wish to print these? Most people reading more than a page of text will print it; a study of scholarly reading behaviour reports that 80% of researchers read scholarly articles on paper and only 20% read them online cite(Nicholas:etal:2008).

Reading a printed web page is usually a disappointing experience -- we have been conditioned to have low expectations of printing from the Web. Even if the printed result is "good enough" we can still only print one web page at a time, unless the site has deliberately created multi-page articles with a combined "printable view" of the content.

Wikipublisher changes this, by giving wiki administrators a way to turn individual pages or page collections into a document suitable for printing. Wikipublisher passes transformed wiki content to the Latex document preparation system cite(Lamport:1994), to produce printed output of the highest quality -- superior to anything that can be achieved using {`CSS|cascading style sheets} or indeed with most word processors.

!![[#design]] The Wikipublisher Project

The project's home page is http://www.wikipublisher.org/. The project was conceived in 2004, we released the first beta version of the Wikipublisher system in late 2005, and have released updates every 2-3 months since. All the software is free and open source.

!!! Goals

We adopted a number of design principles for the project cite(Rankin:2008):

!Online First! The Web enhances our ability to communicate, so most of our work ought to appear online first. Creating content online first makes it instantly and widely accessible, and encourages linking to other online resources. Yet most of our authoring tools are "print first" and turning print documents into `HTML for publishing on the Web is hard to do well.

!Print Still Matters! If a web page is worth reading, it is worth printing. The longer and richer the content, the more likely the reader is to print it. We may skim read a 50 page report online, but to study it, we print it. Yet few web site designs appear to care what the printed form of the site looks like. Experience has taught us to have low expectations of printed web pages.

!One Authoritative Source! The most up-to-date content version is what appears on a web page; the typeset `PDF is a snapshot taken at a point in time. In contrast, most publishing systems require three or more versions: word processing or other source; a `PDF snapshot of the word processing source, and a collection of static web pages generated from the source.

!!!Architecture

Fig(fig.architecture) shows the architecture of Wikipublisher. The core architectural decision was to treat generating web pages and generating print pages as separate services. This means one print server can potentially support many web page servers -- printing is in most cases a low volume activity compared to browsing, so it is inappropriate to burden the web page server with print duties. We define a print {`API|application programming interface} that lets a web server expose its content in a way that the print server can process. As a result, the print server can work with any web content management system able to support the print `API. This design also promotes a more rigorous separation of the underlying content from its presentation in different media, making a wiki an ideal lightweight content server.

%id=fig.architecture center%Attach:architecture.png"Architecture" | Wikipublisher Architecture

Authors interact with the wiki server as usual with a web browser (1 and 2). To create a print document, a reader submits a form (3) to the print server which says, "If you issue this http request (4), you will receive a stream of Wikibook `XML (5); convert it to Latex and `PDF, then give me back the result (8)." The wiki administrator has configured the wiki server so that (4 and 5), "If you receive an http request (4)in this format, forget all you know about wiki to `HTML and do wiki to `XML instead (5)." The wiki server thus needs to give the reader a form (2) in order to (3), "Tell the print server (at this address) to issue this http request." Finally, the print server also needs to retrieve supplementary materials, such as image files, referenced in the `XML (6 and 7), and return a print document (8).

!!!Implementation

We chose Latex as the typesetting system after eliminating the other candidate, `XSL-FO. [[Formatting Objects -> http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html]] is a markup language for `XML document formatting which is most often used to generate `PDFs. The `FO vocabulary is part of `XSL -- a set of W3C technologies designed for the transformation and formatting of `XML data. At the time we looked (December 2004), we could not find any books on `FO that had been published using `FO. On the other hand, all the books on Latex were published using Latex. It seemed to us that choosing `FO would introduce unnecessary risk and that the risk would make its presence felt as unexpected costs or worse, insurmountable implementation problems. On the other hand, Latex appeared to do everything we could think of.

Fig(fig.implementation) shows the pipeline tool suite approach adopted for Wikipublisher. Wiki markup is translated into an intermediate print-oriented `XML form, and then transformed into Latex. The reasons were largely pragmatic -- we built on top of things that already worked. The '''t'''book system cite(Bronger:2003) is a free software project for converting `XML documents into Latex using `XSLT, so if we could convert wiki markup into `XML, we could use '''t'''book to typeset it. The `PmWiki project cite(Michaud:2002) is a ''markup agnostic'' wiki engine (almost), which lets a site administrator redefine or augment the markup translation rules.

%id=fig.implementation center%Attach:implementation.png"Implementation" | Wikipublisher Implementation Pipeline Tool Suite

We wrote a plug-in for PmWiki cite(Rankin:2009) (written in `PHP) that replaces all the wiki to `HTML translation rules with wiki to `XML rules. We found that the wiki markup had rules for which there were no equivalents in the '''t'''book {`DTD|document type definition} and hence no `XML to Latex translations. We therefore added a range of extensions to the '''t'''book `DTD, style files and `XSLT, and called the resulting `XML to Latex conversion service Wikibook and [[Wikibook `DTD -> http://www.wikipublisher.org/dtd/wikibook.dtd]]. The plug-in also provides a "print metadata manager" which lets authors and readers customise the way the print output is presented, by passing configuration parameters to the Wikibook `PDF server.

!!!Transformation

We have made `XML generation and Wikibook transformation as robust as possible. What this means is that consistent presentation of printed outputs is completely automatic -- not just within a document type (all reports have the same look), but different document types are all recognisably part of the same family. So businesses such as professional services firms, which typically produce a large number of documents of a small number of document types, can get a consistent look (a house style) at minimal cost and in particular with less quality control effort. There is a huge quality advantage when we shift typesetting from the desk-top to the server, because we eliminate local stylistic variations. Of course, limiting local customisation can also be a disadvantage in many situations.

!!!Bring Print Culture to Web Content

When converting web pages to print, people expect the typesetting engine to apply standard conventions for printed material. For a given input, it should optimise the quality of the printed output and apply the rules of typesetting consistently to every page. This means authors can focus on content, rather than presentation. It also means authors do not need to be typesetting experts to produce professional-looking printed documents from their web page collections.

Readers of print documents expect that these will follow the layout conventions developed and refined over the centuries since the invention of the printing press. It is not enough just to take the content and wrap it in a print-oriented layout template; there are other, more subtle conventions that we need to apply to the content itself. Examples of printing press conventions are the treatment of quote marks, hyphens and separators:

* smart quotes -- treat straight quote marks as markup characters and "smarten" them into the equivalent `HTML entities (including exceptions such as `'phone and prime marks such as 6'' 30"")

* smart hyphens -- recognise em dash and en dash markup -- such as 1-10 and Wellington`-Picton -- and translate these into the equivalent entities

* smart separator -- a horizontal rule offers an opportunity for a small typographic flourish, as the (configurable) separator below shows

----

Links to external Web pages often have long `URLs, typically resulting in spurious white space in justified text. Wikipublisher automatically makes all punctuation characters discretionary (i.e., we treat punctuation in `URLs as wiki markup), thereby allowing a long `URL to split over 2 or more lines if necessary. To avoid possible ambiguity, we put the punctuation character ''after'' the line break. We print the link text in the body and the `URL in a footnote or citation.

Writers also no longer need to know that the convention for use of footnotes is that the footnote reference character appears ''after'' a punctuation mark such as a comma or a full stop. Wikipublisher just makes sure it happens. If a reader chooses to publish a page using vertical spaces between paragraphs (rather than indents), Wikipublisher will automatically change the formatting of footnote text to hang the number into the left margin and suppress the first line indent.

Printed layout has to be adaptable to the needs of the user and the constraints of the medium. The Wikibook `DTD describes the structure of the page's content, not its presentation. Presentation is left to the typesetting engine and is controlled, to a degree, through print <meta> tags and their attributes. These allow authors and readers to set printing preferences such as the paper size, the layout of a book cover, or turn off duplex.

>>frame<<
Style information is only partially supported. There are major differences between the `CSS-based model used in `XHTML (and implicit in PmWiki) and the structure-based approach used in Latex. Currently, there is no print equivalent to the rich style options found in `XHTML. The approach we have taken is to translate those style options with recognisable equivalents (inline styles such as text colour blue), and ignore the rest (block styles such as place a red dotted border around this paragraph with 5px padding).
>><<

[^#^]

!!!Public Wikibook PDF Service

We run a free public Wikibook `PDF server for those wishing to try out the software. In the past 5 years we have had 340 wiki sites registered to use the Wikipublisher system via the public server. This has been a fruitful source of feedback for the system's evolution, in response to others' experiences. The web site includes an issues register for people to log bugs and change requests, a tip of the week section where we publish short "how to" stories, software release notes, and a cookbook for user-contributed local customisations (plug-ins) that extend Wikipublisher's capabilities. There is also a discussion group, which is often the first point of contact for people encountering problems or seeking new features. Currently, the discussion group has 40 members.

We have had a total of 680 downloads of the open source `PDF Server, on average about 15 per month, for the past 3`1/2 years. Most users have found the Wikipublisher system from the web page cite(Rankin:2009) that describes the PmWiki plug-in.


!![[#futurework]] Future Work

The Wikipublisher typesetting system is functionally complete for our own purposes, but offers several fruitful avenues for further development. The Wikipublisher project sees four areas where future work is desirable, if there is sufficient customer demand.

!!!Wikibook on Microsoft Windows

To date, people are running the Wikibook `PDF server on various flavours of GNU/Linux plus Mac OS X. The project regularly receives e-mails from people who want to know whether it will run on Windows. We reply that as far as we know, all the software they will need is available for Windows, but we do not know of any Windows installations. We offer to help them work through any issues they may encounter and to document the Windows installation process. We are yet to hear back from one of these emails.

!!!Extensions to other content management engines

There are limitations to the adoption of Wikipublisher. Since it was possible to write a plug-in for PmWiki to output Wikibook `XML, it should also be feasible to add the same `API capability to other wikis, blogging engines and web content management systems which support third-party plug-ins. Again, the project has received several enquiries, in particular about support for [[MediaWiki -> http://www.mediawiki.org/]], but to date none has turned into a real project. Given the scale of the undertaking and in the absence of a customer willing to provide time or money, we have been reluctant to embark on this.

!!!User-specified Latex classes

In an ideal world, an author could instruct the Wikibook `PDF server to typeset their content using any valid Latex class file (as long as it is reachable with an http request). The current Wikibook `DTD defines four distinct document types: letter, article, report and book. The wiki plug-in makes sure the wiki produces Wikibook `XML that complies with the requested `DTD. To support user-defined classes, Wikipublisher would have to make sure that the document type used is compatible with the specified class.

It would have been really useful to load the correct `ACM template for this paper! As it was, the authors exported the raw Latex as an article and manually converted this to use a different class.

!!!Use of Wikipublisher

To inform further development of the system, we would like to conduct an empirical study of how people are using Wikipublisher. We would like to explore the following research question with the current user base: "What has been your experience using Wikipublisher?" We envisage setting up an online survey form (on Wikipublisher) and gathering qualitative data from a self-selecting sample of users. The survey would explore the kind of content, motivations for adopting Wikipublisher, benefits they have gained, issues they have encountered, and their plans for the future.

!![[#conclusions]] Conclusions

The better Wikipublisher does its job, the less people notice it; good typography is invisible, letting the reader focus on reading. In producing print documents, most people are accustomed to making a trade-off between the convenience of a word processor and the quality of a desk-top publishing system. Most choose convenience, with the unfortunate result that typographic mediocrity has become entrenched in our culture. One of the big reasons for the popularity of wikis is their convenience. Wikipublisher lets us combine the convenience of a wiki with the typesetting quality of the finest desk-top publishing software. Because the system embeds good typesetting practices in the software, the quality comes free.

Over the last few years, we and our users have learnt a lot about document layout practice and built our knowledge into the Wikipublisher software. By making typesetting a shared web-based service, this knowledge automatically becomes available to everyone who uses the Wikipublisher system. This cannot happen when typesetting is a personal computer-based service. Even if, for example, we improve a document template, we have to distribute it to everyone and it does nothing for our existing document collections. On the other hand, because Wikipublisher forces a separation between content in wiki markup and its presentation either for the Web or print, improving the print presentation automatically upgrades the layout of all content, the next time any page is printed.
Page last modified on 14 June 2009 at 08:43 PM