> This is interesting. I thought that PDF had a more efficient way > of storing data than PostScript and as a result allowed for > faster reading and writing, although I've never looked into the > details. I haven't yet switched over from using grops to gropdf, > but I was beginning to think it was an inevitable path to better > document processing. Can you explain your preference for > PostScript a little further?
It's complicated. Postscript is a programming language, and as such there's no real limit to what you can do with it. One issue with respect to documents is page independence (or rather, the lack thereof): something you do on page 3 of the document can influence what happens on page 27. This is no big deal for stuff printed sequentially on a printer, but it wreaks havoc in documents which a user can browse in random page order on a screen. Of course, there is no compulsion that later pages must depend on earlier ones -- indeed, I don't think there is a good reason nowadays not to format the document so that all pages are independent of each other -- but page independence is not a requirement, only an option. Adobe has issued a set of guidelines (the Document Structuring Conventions) which may be used to inform a document processor whether pages must be processed sequentially or whether they can be accessed in random order. This is also only a convention, and again strictly optional -- a printer (which processes pages in sequence) does not require it. As a result, many Postscript generators have opted for the easy way out: just dump everything into the stream in the order it is needed for sequential printing, and screw page independence. This approach of course makes document (re)processing (such as extracting individual pages) just so much harder. My guess is that PDF was an attempt to master the resulting chaos. PDF requires all pages to be independent, and also requires the document producer to explicitly state which resources are used by each page. These are noble goals, but the file format chosen to achieve this is complex, with byte pointers to different objects all over the place. Apart from the built-in compression, this means you can't simply edit the file in a text editor (as you can with Postscript) if something in the document is not to your liking -- if you add or remove characters, many of the pointers will be off, rendering the document unreadable. (Then again, Adobe is in the business of selling you software to edit PDF files, so simplicity in the file format is not necessarily in their interests.) Some of the complexity in the file format can be seen as catering toward making the life of document creator programs easier (and consequently, a little harder for the document viewer), instead of making the document creator do just a little more work in order to allow a simpler file format. (I'm referring to the practice of making stream lengths an indirect object, allowing the document creator to calculate the stream length as the stream is being output, then putting the computed length at the *end* of the stream with its own object number and entry in the object table, instead of having the document creator compute this beforehand and put it at the beginning of the stream, so that the viewer can know the expected stream length before reading the stream and without having to do an object lookup.) Furthermore, PDF does away with the single greatest feature of Postscript: programmability. (Just imagine groff without macros.) Sure, allowing loops and conditionals and whatnot in a document can cause unpredictability, but when done right it can make document structure so much simpler, for example with subroutines that accept arguments to draw repeated graphic objects with slight variations. (Datapoints in a scatterplot, for example, with different shapes, colors, and sizes. Gnuplot, for instance, makes nice use of this capability, and you can easily tweak the appearance by editing the Postscript code.) Postscript's integration of programming constructs and graphics functions is extremely elegant. The idea of treating all graphic objects (including the letters of text to be printed) as paths to be filled and stroked really shines when combined with the ability to manipulate these paths and the coordinate transformation matrices on-the-fly through algorithms. And the stack-based postfix approach works well when you have to pass around, load and store, and otherwise perform computations on and manipulate data that can ultimately be used to draw stuff on the page. PDF uses the same postfix syntax in the actual page streams. But without the ability to manipulate objects, the stack-based approach loses its utility -- putting stuff on the stack to retrieve later doesn't make much sense if you can't do anything with it. (Of course, PDF is derived from Postscript; if you already have a Postscript interpreter, it means you can reuse large parts of it.) In a nonprogrammable language, passing data to a graphics function only when and where it is needed seems much more straightforward. A simple syntax consisting of a function name with arguments following (e.g., as in HP's graphics language HPGL, or as in groff) would be much easier to parse for the document viewer, in particular because you can use an optimized syntax for each function. (Some functions only accept numbers, some only text, etc., so debugging will also be easier.) Add to this recent developments like cross-reference streams, "compatible" PDFs which include both a cross-reference table *and* a cross-reference stream (wtf?), embedded XML for "semantic" purposes, and PDF comes across as a terrible hodgepodge of different syntaxes. Of course all of this is understandable from its history and evolution, but PDF is far from being an elegant file format. PS: I used to think that Postscript drivers would output ugly code, but the atrocities committed under the name PDF are worse. Many PDF creator programs ignore all the operators Adobe has provided to make text printing simpler, and output only the most braindead code imaginable. PPS: I forgot to mention Javascript: yet another different language grafted on. I predict that one of these days Adobe will see the usefulness of programmability within the PDF page streams, and add still another language to provide this. :-)