At 14:43 21/09/2017 +0200, Knut Petersen wrote:
The fonts in the pdfs are identical fonts constructed by ghostscript on
the fly, I think it was Ken Sharp who explained to me some years ago that
the term "subset" is wrong ;-)
Well, sort of, they aren't identical though, they are all different, but
yes constructed fonts. But if you set SubsetFonts=false, then I'd expect
the full font to be embedded, regardless of which glyphs are used.
That's not quite the same as constructing a fully populated new font, but
it may well explain why SubsetFonts=false isn't having the result I'd expect.
Except.....
I thought that when you set bigpdfs, you used 'show' instead of
'glyphshow', and that takes you down a totally different code path, where
Ghostscript/pdfwrite *doesn't* construct a font. It only does that if the
PostScript uses glyphshow, because there is no glyphshow in PDF. Instead it
just uses the font it has. If you don't subset the font then it also
doesn't re-encode it (which is important fo your workflow).
So, if you aren't using glypshow, then the logic is different, and the
fonts really are subsets. Except that they shouldn't be, because
-dSubsetFonts=false says to embed the entire font.
I haven't had the time to check what's actually going on yet, I've had to
go back to working. I'm reasonably certain you aren't using glyphshow,
because if you were pdfwrite would create fonts with different encodings,
and this hack wouldn't work, you'd get the wrong output. So, in this case,
it is correct to call the fonts subsets. The problem is, they shouldn't be
subset.
One emmentaler font + three encodings + one character (scaled to
invisibilty) of each encoding used prior to anything else in the ps
leads ghostscript to produce three different subsets ;-)) of the
emmentaler font in every pdf. But the set of 3Â "subsets" is identical in
any pdf that is produced this way, and so gs is (was) able to remove the
duplicates. That's the --bigpdf trick.
That's not what I see, nor what I would expect. Unless you are using
glyphshow, but if you were doing that then I believe the encodings would
differ significantly and you would get collisions in the encodings, which
would mean the bigpdf trick would produce garbled output.
The PDF files you supplied each contain 1xEmmentaler-20 font, and each one
has a FontFile (the actual data) of a different size. So the fonts in each
case are, actually, different. Again I haven't checked (and its probably
not worth it) but the subsets certainly don't contain the full set of
glyphs and probably only contain the glyph descriptions of the glyphs that
were used.
I don't disagree with the expectation, but what you expect isn't what's in
the files.
That doesn't prevent the trick you are using from working, because all the
fonts have the same name, so if you don't consider the filenames and font
object numbers, then Ghostscript (falsely) considers them to be the same
font. Provided the Encoding is the same (or at least compatible, and
pdfwrite checks that) for each of the fonts, they can safely be treated as
the same font.
We only gather the glyph descriptions as they are used because, in
PostScript, its possible to incrementally download a font, so the glyph
description might not be available until its used. So we can happily copy
the used glyphs from instance 'A' of the font and instance 'B' of the font
(at this point we think they are the same font, possibly with some glyphs
added since we last looked at it), and combine them into one final
destination font.
Now as long as there are no character encodings in the 2 fonts which have
different glyphs at the same character code, everything is fine. The
problem arises if you have two fonts with the same name, but *different*
glyphs at the same code point. Because we think they are the same font,
when we see the second use of the code point, we *don't* copy the glyph. We
see that we already have a glyph at that location, and it must be the same
one, because this is the same font, right ? So we use the existing one.
You get away with this because, in your workflow, there are no collisions
in encoding with the various fonts. If you were using glyphshow I'm fairly
certain this would not be the case.
However, what if you used the same font in TeX ? I don't necessarily mean
the Emmentaler font, I note that there's a font called something like
TeXGyreSchola-Regular in the Lilypond files too, and that will be getting
the same treatment as Emmentaler-20. If someone used that font in TeX
itself, then potentially there's a problem. You could end up with the
encodings colliding and get the wrong glyph when the PDF file is rendered.
Obviously I'm not sure this is a valid concern, I presume for your special
case of creating documentation it isn't, but in the general case I would
think it would be.
I agree that mutool clean can be a good starting point. If I read the
documentation correctly, it does "clean" (remove) unused objects, but it
is unable to subset fonts if not all glyphs of the fonts are used?
Not exactly (caveat; I am not a MuPDF developer, so I could be wrong). It
will never subset the fonts, it just removes unused and duplicate objects.
The 'problem' is that it only considers objects to be identical, and
therefore candidates for removal, if they are, well... identical.
The PDF files you have created from the Lilypond EPS files contain fonts
which are not identical. They are, at least in some sense, subsets. As I
said, I'm not entirely sure why at the moment. I'll have to walk through
the code in a debugger to see what's going on there, and its complicated,
so it will take some time.
But that's why you get no benefit at all from running the final file
through Mutool, each of the FontFile streams is different, so Mutool
correctly decides they are not identical. Ghostscript really ought to do so
as well and indeed, it now does so by default.
lilypond spawns ghostscript. If our --bigpdf option is used the command is
e.g.:
   gs -q -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH
-r1200 -dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None
-sOutputFile=testa.pdf -c.setpdfwrite -ftesta.eps
Yep Masamichi-san already mailed me that, good to have it confirmed though.
In broad, its the same as the command I used. -r1200 is possibly larger
than I would use, its only needed if there are gradients or transparency
though, and there won't be transparency, because this is PostScript.
I did mention it at the end of my email (I see you commented on it later as
well), if I run the 9.21 release, then I do get a single font out, its
still not an entire font. If I run the 9.22 release, then I get *3*
Emmentaler fonts out, each of which is larger than the one in the 9.21
output, none of which is a complete font.
So as I said, this is an area which has changed, it may be that even if I
put back the PDFDontUseObjectNums hack then you won't get the improvement
you did before. Even if you do, its some evidence to add to my warnings
about this. Things change frequently in this area, its inherently fragile
because its loaded with heuristics, and it probably isn't something I can
realistically hope to preserve in the long term.
Hmm, actually, going back to the 9.21 release does produce at least
similar behaviour, whereas the 9.22 release does not. In 9.22 I get three
fonts output instead of 1. I've no idea why currently, and right at the
moment I don't have time to look.
I'll try and remember to look at it when I am not drowning under support,
but it looks like there have been changes in this area unrelated to the
PDFDontUseObjectNum bug, and that in itself may mean that your process
doesn't work any more, or works less well.
Thanks for you patience!
I'm afraid its going to be at least next week now, and that's likely to
disappear in testing the next release candidate. Fixing a couple of the
problems that turned up in RC1 caused differences in about 1/3 of our test
suite. That means manually examining hundreds of pages of bitmaps :-(
Looking at the RC1 bitmaps took 2 of us three or so days to complete, so by
the time we finish fixing the regressions, build a new RC2, run the tests
and gather the output, then examine the bitmaps that's probably all of next
week gone.
If I'm 'lucky' the final couple of problems won't get fixed for a few days,
and I should get a little time to look at this before the testing starts
again. Depends what happens with customer support and that's been just
crazy this week.
Right at the moment, you're probably going to have to leave this with me.
It might be useful for one of you to get hold of the RC1, patch pdf_font.ps
and rebuild GS (or point it to a modified ghostpdl/Resource/Init directory
with the -I switch) and test to see whether this behaviour even still works
for you at all with the new release.
I tried it here myself and it did appear to work, but I'm not entirely sure
I trust that. Also I was (obviously) using the very reduced set of files
Knut sent me, so that may not be a sufficient test.
The commit with the change is here:
http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=ca1ec9b486ddba3f921355fd1d775f27f4871356
Just remove 1 line (dup //null eq) and replace the 6 lines that were
deleted. You can drop the comment lines but its probably easier just to
copy the lot. I *think* that will work.
By the way, you could always do this yourself to your own Ghostscript
installlation anyway.
Ken
PS according to the font Knut sent, its 160 KB, but the font stream in the
EPS files is only 65KB. Don't know if that means anything, possibly the
TrueType portion is the other 100KB. I'd have to decipher the OTF font and
it doesn't seem worth the effort, since that's not really the problem.
100KB seems like a lot though.
_______________________________________________
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel