Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

Ken Sharp Thu, 21 Sep 2017 07:03:37 -0700

At 14:43 21/09/2017 +0200, Knut Petersen wrote:

The fonts in the pdfs are identical fonts constructed by ghostscript onthe fly, I think it was Ken Sharp who explained to me some years ago thatthe term "subset" is wrong ;-)

Well, sort of, they aren't identical though, they are all different, butyes constructed fonts. But if you set SubsetFonts=false, then I'd expectthe full font to be embedded, regardless of which glyphs are used.

That's not quite the same as constructing a fully populated new font, butit may well explain why SubsetFonts=false isn't having the result I'd expect.


Except.....

I thought that when you set bigpdfs, you used 'show' instead of'glyphshow', and that takes you down a totally different code path, whereGhostscript/pdfwrite *doesn't* construct a font. It only does that if thePostScript uses glyphshow, because there is no glyphshow in PDF. Instead itjust uses the font it has. If you don't subset the font then it alsodoesn't re-encode it (which is important fo your workflow).

So, if you aren't using glypshow, then the logic is different, and thefonts really are subsets. Except that they shouldn't be, because-dSubsetFonts=false says to embed the entire font.

I haven't had the time to check what's actually going on yet, I've had togo back to working. I'm reasonably certain you aren't using glyphshow,because if you were pdfwrite would create fonts with different encodings,and this hack wouldn't work, you'd get the wrong output. So, in this case,it is correct to call the fonts subsets. The problem is, they shouldn't besubset.

One emmentaler font + three encodings + one character (scaled toinvisibilty) of each encoding used prior to anything elseÂ in the psleads ghostscript to produce three different subsets ;-)) of theemmentaler font in every pdf. But the set of 3Â "subsets" is identical inany pdf that is produced this way, and so gsÂ is (was) able to remove theduplicates. That's the --bigpdf trick.

That's not what I see, nor what I would expect. Unless you are usingglyphshow, but if you were doing that then I believe the encodings woulddiffer significantly and you would get collisions in the encodings, whichwould mean the bigpdf trick would produce garbled output.

The PDF files you supplied each contain 1xEmmentaler-20 font, and each onehas a FontFile (the actual data) of a different size. So the fonts in eachcase are, actually, different. Again I haven't checked (and its probablynot worth it) but the subsets certainly don't contain the full set ofglyphs and probably only contain the glyph descriptions of the glyphs thatwere used.

I don't disagree with the expectation, but what you expect isn't what's inthe files.

That doesn't prevent the trick you are using from working, because all thefonts have the same name, so if you don't consider the filenames and fontobject numbers, then Ghostscript (falsely) considers them to be the samefont. Provided the Encoding is the same (or at least compatible, andpdfwrite checks that) for each of the fonts, they can safely be treated asthe same font.

We only gather the glyph descriptions as they are used because, inPostScript, its possible to incrementally download a font, so the glyphdescription might not be available until its used. So we can happily copythe used glyphs from instance 'A' of the font and instance 'B' of the font(at this point we think they are the same font, possibly with some glyphsadded since we last looked at it), and combine them into one finaldestination font.

Now as long as there are no character encodings in the 2 fonts which havedifferent glyphs at the same character code, everything is fine. Theproblem arises if you have two fonts with the same name, but *different*glyphs at the same code point. Because we think they are the same font,when we see the second use of the code point, we *don't* copy the glyph. Wesee that we already have a glyph at that location, and it must be the sameone, because this is the same font, right ? So we use the existing one.

You get away with this because, in your workflow, there are no collisionsin encoding with the various fonts. If you were using glyphshow I'm fairlycertain this would not be the case.

However, what if you used the same font in TeX ? I don't necessarily meanthe Emmentaler font, I note that there's a font called something likeTeXGyreSchola-Regular in the Lilypond files too, and that will be gettingthe same treatment as Emmentaler-20. If someone used that font in TeXitself, then potentially there's a problem. You could end up with theencodings colliding and get the wrong glyph when the PDF file is rendered.

Obviously I'm not sure this is a valid concern, I presume for your specialcase of creating documentation it isn't, but in the general case I wouldthink it would be.

I agree that mutool clean can be a good starting point. If I read thedocumentation correctly, it does "clean" (remove) unused objects, but itis unable to subset fonts if not all glyphs of the fonts are used?

Not exactly (caveat; I am not a MuPDF developer, so I could be wrong). Itwill never subset the fonts, it just removes unused and duplicate objects.The 'problem' is that it only considers objects to be identical, andtherefore candidates for removal, if they are, well... identical.

The PDF files you have created from the Lilypond EPS files contain fontswhich are not identical. They are, at least in some sense, subsets. As Isaid, I'm not entirely sure why at the moment. I'll have to walk throughthe code in a debugger to see what's going on there, and its complicated,so it will take some time.

But that's why you get no benefit at all from running the final filethrough Mutool, each of the FontFile streams is different, so Mutoolcorrectly decides they are not identical. Ghostscript really ought to do soas well and indeed, it now does so by default.

lilypond spawns ghostscript. If our --bigpdf option is used the command ise.g.:
Â Â Â gs -q -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH-r1200 -dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None-sOutputFile=testa.pdf -c.setpdfwrite -ftesta.eps

Yep Masamichi-san already mailed me that, good to have it confirmed though.In broad, its the same as the command I used. -r1200 is possibly largerthan I would use, its only needed if there are gradients or transparencythough, and there won't be transparency, because this is PostScript.

I did mention it at the end of my email (I see you commented on it later aswell), if I run the 9.21 release, then I do get a single font out, itsstill not an entire font. If I run the 9.22 release, then I get *3*Emmentaler fonts out, each of which is larger than the one in the 9.21output, none of which is a complete font.

So as I said, this is an area which has changed, it may be that even if Iput back the PDFDontUseObjectNums hack then you won't get the improvementyou did before. Even if you do, its some evidence to add to my warningsabout this. Things change frequently in this area, its inherently fragilebecause its loaded with heuristics, and it probably isn't something I canrealistically hope to preserve in the long term.

Hmm, actually, going back to the 9.21 release does produce at leastsimilar behaviour, whereas the 9.22 release does not. In 9.22 I get threefonts output instead of 1. I've no idea why currently, and right at themoment I don't have time to look.
I'll try and remember to look at it when I am not drowning under support,but it looks like there have been changes in this area unrelated to thePDFDontUseObjectNum bug, and that in itself may mean that your processdoesn't work any more, or works less well.
Thanks for you patience!

I'm afraid its going to be at least next week now, and that's likely todisappear in testing the next release candidate. Fixing a couple of theproblems that turned up in RC1 caused differences in about 1/3 of our testsuite. That means manually examining hundreds of pages of bitmaps :-(

Looking at the RC1 bitmaps took 2 of us three or so days to complete, so bythe time we finish fixing the regressions, build a new RC2, run the testsand gather the output, then examine the bitmaps that's probably all of nextweek gone.

If I'm 'lucky' the final couple of problems won't get fixed for a few days,and I should get a little time to look at this before the testing startsagain. Depends what happens with customer support and that's been justcrazy this week.

Right at the moment, you're probably going to have to leave this with me.It might be useful for one of you to get hold of the RC1, patch pdf_font.psand rebuild GS (or point it to a modified ghostpdl/Resource/Init directorywith the -I switch) and test to see whether this behaviour even still worksfor you at all with the new release.

I tried it here myself and it did appear to work, but I'm not entirely sureI trust that. Also I was (obviously) using the very reduced set of filesKnut sent me, so that may not be a sufficient test.


The commit with the change is here:

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=ca1ec9b486ddba3f921355fd1d775f27f4871356

Just remove 1 line (dup //null eq) and replace the 6 lines that weredeleted. You can drop the comment lines but its probably easier just tocopy the lot. I *think* that will work.

By the way, you could always do this yourself to your own Ghostscriptinstalllation anyway.

Ken

PS according to the font Knut sent, its 160 KB, but the font stream in theEPS files is only 65KB. Don't know if that means anything, possibly theTrueType portion is the other 100KB. I'd have to decipher the OTF font andit doesn't seem worth the effort, since that's not really the problem.100KB seems like a lot though.




_______________________________________________
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

Reply via email to