On Mon 02 Jun 2014 11:55:32 Ralph Corderoy wrote: > Hi Tadziu, > > > However, it does not embed the font (it contains an encoding vector > > and only a reference to "Times-Roman") so it is up to the viewer to > > provide the requested font. > > How do you recommend `disassembling' a PDF to inspect its contents? I'm > happy grokking PostScript but want to see the PDF's structure so > pdf2ps's output, for example, isn't suitable. > > Cheers, Ralph.
Hi Ralph, The most useful tool for working with pdfs is pdftk. One of its uses is to decompress the pdf, so you can at least "read" it in an editor. However, this does not help with the actual structure of the pdf, so I wrote a utility (PL-show.pl and it's module ParsePDF.pm) to display the structure as a mindmap. Run PL-show.pl with the filename of the pdf file and save stdout to a file, then use a program called "freemind" to view this file. In the mindmap click on nodes to view dependent nodes. Bernd-mm.pdf is an example of what the mindmap looks like (with all nodes open), and you can see the first kid of the pages entry has a "Contents" entry which points to object 4. Looking at object 4 in the uncompressed pdf shows:- stream q 1 0 0 1 0 0 cm 1 J 1 j 0 G 0 g q BT 1 0 0 1 72 780 Tm /F5 10 Tf 0 Tc 0 Tw (<8c>le<8c>le) Tj 1 0 0 1 97.56 0 Tm 0 Tc ET Q Q endstream >From this you can see a character <8C> is used, which corresponds to the glyph >"fi" in the "/Differences" table in the encoding object (6 0 R). You need to use an editor to look at the uncompressed pdf as well as the mindmap since it does not display the contents of streams and truncates big arrays, it is just meant as a sort of map of the structure. I wrote it while writing the gropdf driver to see how others had created pdfs! Cheers Deri
Bernd.mm.pdf
Description: Adobe PDF document
PL-show.tar.gz
Description: application/compressed-tar