All,
I'm trying to convert a PDF to an image and I'm encountering problems with
some font rendering on some Linux systems.  If anyone could provide any
ideas on how to fix this I'd appreciate it.

The PDF is too large to attach, so it's available at this link:
https://drive.google.com/file/d/1dNXgHsfn0cy2Gx9HxhSTQdeWAAjaDplk/view?usp=sharing

So far as I can tell, the attached file comes from some sort of mail
merge-style application that is injecting text into a template.  The
injected text uses a different font than the rest of the document.

On Windows systems, this works fine, but on Linux systems, PDFBox renders
the text as gibberish glyphs in a way that I've never seen before.

When I reproduce the issue with logging increased to trace, I get the
following line in the log.

15:55:15.622 [main] WARN org.apache.pdfbox.pdmodel.font.PDCIDFontType2 -
Using non-embedded GIDs in font Calibri

When I list the fonts in the PDF, Calibri is listed as both an embedded
*and *an Identity-H font.  Given that we have to substitute Carlito for
Calibri, this may be relevant.

In the source code
<https://github.com/apache/pdfbox/blob/d6ebddf07f99bcc04f5b106c84623048b697bee7/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/PDCIDFontType2.java#L241>,
a comment line suggests there's a mismatch that involves GIDs, CIDs, and
embedded vs non-embedded fonts.

Has anyone here ever seen behavior like this before?  Is this a bug?  If it
is a bug, what is the procedure to report it?

If it's not a bug, does anyone have any suggestions on what I might need to
fix in my environment?

Any input that anyone might have would be helpful.

Thank you,
Daniel

Reply via email to