Constantine Dokolas created PDFBOX-6004:
-------------------------------------------

             Summary: Support "SymbolSetEncoding" for fonts
                 Key: PDFBOX-6004
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6004
             Project: PDFBox
          Issue Type: Improvement
          Components: PDModel
    Affects Versions: 2.0.33
            Reporter: Constantine Dokolas


I've encountered a PDF with a font named "SymbolMT" which defines its encoding 
as `SymbolSetEncoding`. Using the debugger app, I get a warning (`Warning 
[PDSimpleFont] Unknown encoding: SymbolSetEncoding`) and multiple `No Unicode 
mapping ...` warnings.

I couldn't find official documentation for this encoding, but the `pdf.js` 
project has support for this encoding implemented [here 
|https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
 and it looks correct at first glance.

Perhaps it's possible to support this encoding?

Note: I've not tested text extraction with PDFBox to see what codepoints are 
generated, but Adobe Acrobat converts those codes to the box char (the 
"unknown" char?)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to