Ken Sharp <ken.sh...@artifex.com> writes:

> On 17/03/2025 13:16, David Kastrup wrote:
>
>> The interaction with GhostScript involves "please render now page xxx
>> into a PNG file".  This interaction is done in a pipe, the order is not
>> known in advance but depends on the viewing behavior of the user.
>
> OK but....
>
>> pdf2dsc solves the task of splitting the PDF processing into one
>> per-document block of resources that are loaded in advance and into
>> per-page rendering instructions.
>>
>> Th bulk of this is essentially guaranteed to be stable by how DSC is
>> defined as an interoperability standard.
>
> So if I understand this correctly, you start up Ghostscript and leave
> it running. You then split up the 'dsc' file into 'setup', 'trailer'
> and 'per page' sections. You then send the 'setup' to Ghostscript via
> stdin, to initialise the PDF interpreter with the specific PDF file.
>
> You then run each 'page' section, as required, in the same way.
>
> Finally you run the 'trailer' section. Presumably you also send a
> 'quit' to Ghostscript to close it.
>
>
>> Last time I looked (admittedly quite some time ago), I have not found
>> anything in GhostScript that provides similar (and documented!)
>> stability for PDF processing of individual pages efficiently and in
>> dynamically determined order.
>
> The scripting of the PDF interpreter from PostScript is documented:
>
> https://ghostscript.readthedocs.io/en/latest/Language.html#postscript-operators-interfacing-to-the-pdf-interpreter

And how long is that going to be working in the same manner?  In
contrast to interfaces decreed by Adobe, there has been considerable
fluidity in the operators defined by Ghostscript.

> The old code was neither documented nor stable.

The code not.  The DSC format very much so in effect.  We had a number
of headaches over the years as Ghostscript changed file file access
safety, various operators and stuff, all of that related to PostScript
interpreter operation.  We never had any additional problem with PDF as
far as I remember: DSC stayed workable as the PDF-specific access
method.

> I see several approaches:
>
> 1) Simplest of all; bundle the old pdf2dsc.ps program with AUCTeX
> instead of with Ghostscript and continue to use it.
>
> 2) Someone at AUCTeX modifies the preview program so that it runs a
> new Ghostscript instance to produce a PNG file every time the user
> demands a page.

That would be gruesomely slow.  preview-latex is a rather efficient
utility used in interactive manner.

> 3) Somebody at AUCTeX learns enough about scripting the Ghostsscript
> PDF interpreter, and the preview application of AUCTeX, to modify the
> preview program to generate the required PostScript to control the PDF
> interpreter.

How often would that have to be amended in future?

> 2) is likely to be somewhat slower than the current implementation. On
> the other hand it is robust and very stable.

"Somewhat slower" meaning a factor of more than tenfold.  You have to
keep in mind that the "pages" in question here are usually merely a few
characters in size, and a 30-page document will easily contain
thousands.  We've had serious speedups alone by reducing the canvas size
to match because just initializing that bitmap spent a significant
amount of time.

> 3) Doesn't seem likely, no volunteers to take on the rewrite.

The problem is that judging from previous experience with Ghostscript
development, this would not be a one-time task.

> 4) If I do this then there will have to be some changes.
>
>   The entire pdf2dsc program will need to be rewritten because what's
>   there right now isn't sustainable (the reliance on the old PDF
>   interpreter isn't likely to keep working). In order to do that I'm
>   going to need to know what parts of the 'dsc' output AUCTeX actually
>   needs; comments as well as functional PostScript, because I see some
>   slightly odd comments in there which look important (the media
>   names). I presume the %%Pages comment is used, what about the
>   %%PageBoundingBox and %%Orientation comments ? What about the use of
>   DELAYSAFER ? I'm not keen on keeping that unless there's an awfully
>  good reason for it.

This is really very basic, so that I just quote from the code:


(defun preview-dsc-parse (file)
  "Parse DSC comments of FILE.
Return a vector with offset/length pairs corresponding to
the pages.  Page 0 corresponds to the initialization section."
[...]
      (while (search-forward-regexp "\
%%\\(?:\\(BeginDocument:\\)\\|\
\\(EndDocument[\n\r]\\)\\|\
\\(Page:\\)\\|\
\\(Trailer[\n\r]\\)\\)" nil t)
[...]

Those structure comments are all that is specifically parsed.

In addition, the %%BoundingBox comment is being parsed (in the current
version, not even the exactboundingbox).

This is then employed in the following manner:

(defun preview-gs-dsc-cvx (page dsc)
  "Generate PostScript code accessing PAGE in the DSC object.
The returned PostScript code will need the file on
top of the stack, and will replace it with an executable
object corresponding to the wanted page."
  (let ((curpage (aref dsc page)))
    (format "dup %d setfileposition %d()/SubFileDecode filter cvx"
            (1- (car curpage)) (nth 1 curpage))))

This works the same whether or not the "PostScript" file is generated
with dvips or with pdf2dsc .

>   The 'preview' program is also going to need to call Ghostscript with
>   the relevant permissions flags for each program (the 'dsc'
>   file-producing program and the relevant 'dsc' program portion; the
>   'setup') because these will both need to be able to open the
>   relevant files. But I assume this must be done already, somehow, or
>   it is using -dNOSAFER which is really not a great idea.

The safety management has been a moving target for a number of years but
has been comparatively stable for decades now.  Essentially everything
is setup and opened in advance, then safety is switched on.

> A lot of point 4 would probably be covered by Ikumi's offer to tell me
> how the preview program currently communicates with Ghostscript, but
> perhaps David could short-circuit it if he happens to already be aware
> of how this works and what the preview program requires to be present
> in the 'dsc' file ?

-- 
David Kastrup

Reply via email to