Architecture: How Wikipublisher works
What happens under the hood when Wikipublisher creates a PDF? Walk me through the process.
century printing press at work
The Wikibook PDF server is implemented as a Web service: given a URL that returns content in Wikibook XML, it will compose this into a PDF file suitable for printing. There are 3 players in the drama — Web Site, who hosts the content; Wikibook Server, who offers a typesetting service; and Content Reader, who wishes to turn a Web page or pages into print.
Let’s follow the content as it moves through the process. Some of what follows is “merely corroborative detail, intended to give artistic verisimilitude to an otherwise bald and unconvincing narrative.”1
- Web Site contains a link to Wikibook Server, with an instruction that says, “If you ask me this question, I will answer you in Wikibook.”
- Content Reader clicks on this link and jumps to visit Wikibook Server, carrying the question to which the answer will be in Wikibook.
- Wikibook Server checks to see if Web Site is an approved URL and if so, issues a request to Web Site that says, “Content Reader has asked me to ask you this question; I expect you to answer in Wikibook.”
- Web Site responds to Wikibook Server’s request with a stream of Wikibook, if authorized to do so.
- Being of a suspicious nature, Wikibook Server checks to see whether Web Site has, in fact, answered in Wikibook; if not, he gives Content Reader a telling-off for trying to trick him.
- Wikibook Server now sniffs the answer to see if it really is Wikibook; the content must be well-formed and valid against the DTD.
- Wikibook Server peruses the XML for references to images and, on finding any, retrieves the images from whoever is holding them; in many cases, this will be Web Site.
- So far, so good; Wikibook Server is cooking with gas:
- transform the Wikibook into LATEX typesetting instructions
- translate Unicode characters and HTML entities into LATEX equivalents
- process the LATEX into a PDF file
- clean up all the debris left behind
- deliver the PDF to Content Reader
- Or not. LATEX is a harsh and unforgiving god, who sometimes rejects our offerings as unworthy. In this case, Wikibook Server reports the problem to Content Reader, in the hope that she can fix it and try again.
- Content Reader sees the PDF in her chosen viewer, from which she can read, print, save, or just admire it.
The architecture minimizes the information that Web Site and Wikibook Server need to know about one another:
- Web Site needs to know Wikibook Server’s address and how to answer questions in Wikibook XML.
- Wikibook Server only needs to know whether he’s allowed to ask Web Site a question; he relies on Web Site to tell him, via Content Reader, what question to ask.
- Web Site doesn’t have to know how to answer in Wikibook; he could outsource this to a hypothetical Wikibook Helper service that translates HTML into Wikibook.
Credit goes to Donald Gordon for putting the pieces of Wikibook Server together, so that “it just works”.
« 00034 · Edit Form · 00036 »