Re: [Groff] Need help with pdfmark

Keith Marshall Wed, 13 Oct 2010 11:44:24 -0700

Hi Larry,

On Tuesday 12 October 2010 19:12:43 Larry Kollar wrote:
> I've been successfully using pdfmark to generate PDFs with bookmarks
> for some time now, no problem. Now I'm trying to make my
> cross-references into actual PDF links, and that's where I'm running
> into trouble. Part of the problem, I suppose, is the part of the
> pdfmark documentation I need has not been written. :-)


Yeah, it's a rather unsatisfactory situation.  Unfortunately, with my 
present day-job work load, it's likely that this will become a project 
for my retirement -- at least another year away, and maybe as many as 
six more years. :-(

> The other part 
> is that I'm trying to graft this feature into an existing process, so
> I can't just rip it out and start over with pdfroff.

Okay, so it's in understanding the mechanics of pdfroff's processing 
that you need help, so that you can reproduce the effect with your 
existing work flow?

> My current cross-reference generation consists of a macro XRT to
> define a target based on the text and page of a heading immediately
> preceding:
>
> .de XRT
> .if \\n[TocGen] \{\
> . tm XREF: xref:\\$1:txt \\*[xref:HDtxt]
> . tm XREF: xref:\\$1:pg \\*[xref:HDpg]
> .\}
> .ie '\\*[.T]'html' .TAG \\$1
> .el .pdfhref M -N \\$1
> ..
>
> The strings "xref:HDtxt" and "xref:HDpg" are defined by the heading
> macro. The argument to XRT was essentially a named destination tag to
> begin with.

This will place a PDF marker at the destination point for any number 
of pdfhref links; it doesn't deal with the creation of any such link, 
which would require use of `.pdfhref L ...', rather than the use of 
`.pdfhref M ...', as we see here.

> Here's my big problem: the aux-file is full of entries like:
>
> grohtml-info:page 170  353411  517000  355611  530398  540000    1  1
>  ./somefile.ms

This is the standard format of the stderr output generated by groff's 
`\O2' escape.

> And I don't see anywhere in pdfmark.tmac where it outputs this
> particular string. I'm guessing this is a hotspot definition, but the
> string "grohtml-info" doesn't appear anywhere in pdfmark.tmac.

It comes from pdfmark.tmac's pdf*href.mark.end macro, (intended for 
internal use only), which is invoked twice, (first time indirectly by 
pdf*href.mark.begin), for each invocation of `.pdfhref L ...', (or of 
`.pdfhref W ...'), to capture the output page co-ordinates for the 
beginning and end of each link hot-spot region.

> I've looked at the pdfroff script but it's not helping much.

The relevant fragment (folded for e-mail display) looks like:

  # We now extend the local copy of the reference dictionary file,
  # to create a full 'pdfmark' reference map for the document ...
  #
    $AWK '/^grohtml-info/ {print ".pdfhref Z", $2, $3, $4}' \
       $WRKFILE >> $REFCOPY

Also relevant, and performed earlier, (during the initial multi-pass 
processing phase, which is required to identify the placement of any 
reference marks set by `.pdfhref M ...'), may be:

  # Run 'groff' and 'awk', to identify reference marks in the document
  # source, filtering them into the reference dictionary; discard
  # incomplete 'groff' output at this stage.
  #
    eval $STREAM $GROFF_STYLE -Z 1>$NULLDEV 2>$WRKFILE \
      $REFCOPY $INPUT_FILES
    $AWK '/^gropdf-info:href/ {$1 = ".pdfhref D -N"; print}' \
      $WRKFILE > $REFFILE

where $STREAM represents a hack to reproduce any stdin input piped to 
pdfroff itself as stdin input to groff, in each and every processing 
pass, while $GROFF_STYLE represents `groff -Tps` followed by any of 
groff's own options which are specified on the pdfroff command line.

> What I need is a more generalized setup and some output I understand,
> then I could perhaps pipe it back in and go from there.

$GROFF_SOURCES/contrib/pdfmark/pdfmark.ms, (the source for the existing 
incomplete documentation), is an example of document source suitable 
for processing by pdfroff with ms macros, (wrapped by spdf.tmac).  The 
salient aspects of the processing mechanics are:

  1) Multiple pre-processing passes are required, using the second
     command sequence indicated above, to locate any PDF reference
     marks; $WRKFILE captures groff's stderr output in each pass.

  2) At the outset, $REFCOPY represents an empty file.

  3) At the end of each pre-processing pass, PDF reference data is
     filtered out of $WRKFILE, and transformed into `.pdfhref D ...'
     requests in $REFFILE.

  4) Each time $REFFILE is regenerated, its content is compared with
     that of $REFCOPY, as it is at the start of the current cycle;
     if the two are identical, the pre-processing cycle terminates,
     otherwise...

  5) $REFCOPY is replaced by the content of $REFFILE, and the cycle
     is repeated, to regenerate $REFFILE once again.

  6) After the pre-processing cycle terminates, $REFFILE and $REFCOPY
     should represent identical files; (if not, then references have 
     not been satisfactorily resolved to stable locations, within the
     maximum cycle count limit imposed by pdfroff).  At this point,
     $REFCOPY is augmented by filtering the link hot-spot reference
     data from the last generated $WRKFILE, transforming it using the
     first command noted above, and appending the resultant mapping
     data as `.pdfhref Z ...' records, (two per hot-spot), to the
     final content of $REFCOPY.

  7) The document sources, with this final generation of $REFCOPY
     included as the first input file, are processed through groff
     to produce PostScript intermediate output, which is filtered 
     through GhostScript, to create the final PDF output.

Hopefully, the above will give you enough to get you going; just one 
word of warning: don't add `.pdfhref Z ...' records (manually) to your 
document sources -- the presence of just one such record will disable 
the use of `\O2' in `.pdfhref L ...' and `.pdfhref W ...' requests, 
making it virtually impossible to generate a hot-spot map.

-- 
Regards,
Keith.

Re: [Groff] Need help with pdfmark

Reply via email to