Hi Larry, On Tuesday 12 October 2010 19:12:43 Larry Kollar wrote: > I've been successfully using pdfmark to generate PDFs with bookmarks > for some time now, no problem. Now I'm trying to make my > cross-references into actual PDF links, and that's where I'm running > into trouble. Part of the problem, I suppose, is the part of the > pdfmark documentation I need has not been written. :-)
Yeah, it's a rather unsatisfactory situation. Unfortunately, with my present day-job work load, it's likely that this will become a project for my retirement -- at least another year away, and maybe as many as six more years. :-( > The other part > is that I'm trying to graft this feature into an existing process, so > I can't just rip it out and start over with pdfroff. Okay, so it's in understanding the mechanics of pdfroff's processing that you need help, so that you can reproduce the effect with your existing work flow? > My current cross-reference generation consists of a macro XRT to > define a target based on the text and page of a heading immediately > preceding: > > .de XRT > .if \\n[TocGen] \{\ > . tm XREF: xref:\\$1:txt \\*[xref:HDtxt] > . tm XREF: xref:\\$1:pg \\*[xref:HDpg] > .\} > .ie '\\*[.T]'html' .TAG \\$1 > .el .pdfhref M -N \\$1 > .. > > The strings "xref:HDtxt" and "xref:HDpg" are defined by the heading > macro. The argument to XRT was essentially a named destination tag to > begin with. This will place a PDF marker at the destination point for any number of pdfhref links; it doesn't deal with the creation of any such link, which would require use of `.pdfhref L ...', rather than the use of `.pdfhref M ...', as we see here. > Here's my big problem: the aux-file is full of entries like: > > grohtml-info:page 170 353411 517000 355611 530398 540000 1 1 > ./somefile.ms This is the standard format of the stderr output generated by groff's `\O2' escape. > And I don't see anywhere in pdfmark.tmac where it outputs this > particular string. I'm guessing this is a hotspot definition, but the > string "grohtml-info" doesn't appear anywhere in pdfmark.tmac. It comes from pdfmark.tmac's pdf*href.mark.end macro, (intended for internal use only), which is invoked twice, (first time indirectly by pdf*href.mark.begin), for each invocation of `.pdfhref L ...', (or of `.pdfhref W ...'), to capture the output page co-ordinates for the beginning and end of each link hot-spot region. > I've looked at the pdfroff script but it's not helping much. The relevant fragment (folded for e-mail display) looks like: # We now extend the local copy of the reference dictionary file, # to create a full 'pdfmark' reference map for the document ... # $AWK '/^grohtml-info/ {print ".pdfhref Z", $2, $3, $4}' \ $WRKFILE >> $REFCOPY Also relevant, and performed earlier, (during the initial multi-pass processing phase, which is required to identify the placement of any reference marks set by `.pdfhref M ...'), may be: # Run 'groff' and 'awk', to identify reference marks in the document # source, filtering them into the reference dictionary; discard # incomplete 'groff' output at this stage. # eval $STREAM $GROFF_STYLE -Z 1>$NULLDEV 2>$WRKFILE \ $REFCOPY $INPUT_FILES $AWK '/^gropdf-info:href/ {$1 = ".pdfhref D -N"; print}' \ $WRKFILE > $REFFILE where $STREAM represents a hack to reproduce any stdin input piped to pdfroff itself as stdin input to groff, in each and every processing pass, while $GROFF_STYLE represents `groff -Tps` followed by any of groff's own options which are specified on the pdfroff command line. > What I need is a more generalized setup and some output I understand, > then I could perhaps pipe it back in and go from there. $GROFF_SOURCES/contrib/pdfmark/pdfmark.ms, (the source for the existing incomplete documentation), is an example of document source suitable for processing by pdfroff with ms macros, (wrapped by spdf.tmac). The salient aspects of the processing mechanics are: 1) Multiple pre-processing passes are required, using the second command sequence indicated above, to locate any PDF reference marks; $WRKFILE captures groff's stderr output in each pass. 2) At the outset, $REFCOPY represents an empty file. 3) At the end of each pre-processing pass, PDF reference data is filtered out of $WRKFILE, and transformed into `.pdfhref D ...' requests in $REFFILE. 4) Each time $REFFILE is regenerated, its content is compared with that of $REFCOPY, as it is at the start of the current cycle; if the two are identical, the pre-processing cycle terminates, otherwise... 5) $REFCOPY is replaced by the content of $REFFILE, and the cycle is repeated, to regenerate $REFFILE once again. 6) After the pre-processing cycle terminates, $REFFILE and $REFCOPY should represent identical files; (if not, then references have not been satisfactorily resolved to stable locations, within the maximum cycle count limit imposed by pdfroff). At this point, $REFCOPY is augmented by filtering the link hot-spot reference data from the last generated $WRKFILE, transforming it using the first command noted above, and appending the resultant mapping data as `.pdfhref Z ...' records, (two per hot-spot), to the final content of $REFCOPY. 7) The document sources, with this final generation of $REFCOPY included as the first input file, are processed through groff to produce PostScript intermediate output, which is filtered through GhostScript, to create the final PDF output. Hopefully, the above will give you enough to get you going; just one word of warning: don't add `.pdfhref Z ...' records (manually) to your document sources -- the presence of just one such record will disable the use of `\O2' in `.pdfhref L ...' and `.pdfhref W ...' requests, making it virtually impossible to generate a hot-spot map. -- Regards, Keith.