Constantine Dokolas created PDFBOX-6004: -------------------------------------------
Summary: Support "SymbolSetEncoding" for fonts Key: PDFBOX-6004 URL: https://issues.apache.org/jira/browse/PDFBOX-6004 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.33 Reporter: Constantine Dokolas I've encountered a PDF with a font named "SymbolMT" which defines its encoding as `SymbolSetEncoding`. Using the debugger app, I get a warning (`Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding`) and multiple `No Unicode mapping ...` warnings. I couldn't find official documentation for this encoding, but the `pdf.js` project has support for this encoding implemented [here |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207] and it looks correct at first glance. Perhaps it's possible to support this encoding? Note: I've not tested text extraction with PDFBox to see what codepoints are generated, but Adobe Acrobat converts those codes to the box char (the "unknown" char?) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org