[ https://issues.apache.org/jira/browse/PDFBOX-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950428#comment-17950428 ]
Tilman Hausherr commented on PDFBOX-6004: ----------------------------------------- Please try this change in PDSimpleFont.readEncoding(), at {{if (this.encoding == null)}} change the code to this: {code:java} if (this.encoding == null) { if ("SymbolMT".equals(getName())) { // PDFBOX-6004 encoding = SymbolEncoding.INSTANCE; } else { LOG.warn("Unknown encoding: {}", encodingName.getName()); this.encoding = readEncodingFromFont(); // fallback } } {code} and tell me if it works. It might be more complex, see PDFBOX-4017. > Support "SymbolSetEncoding" for fonts > ------------------------------------- > > Key: PDFBOX-6004 > URL: https://issues.apache.org/jira/browse/PDFBOX-6004 > Project: PDFBox > Issue Type: Improvement > Components: PDModel > Affects Versions: 2.0.33 > Reporter: Constantine Dokolas > Priority: Minor > Attachments: image-2025-05-08-18-55-09-198.png > > > I've encountered a PDF with a font named "SymbolMT" which defines its > encoding as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a > warning ({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) > and multiple {{No Unicode mapping ...}} warnings. > I couldn't find official documentation for this encoding, but the {{pdf.js}} > project has support for this encoding implemented [here > |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207] > and it looks correct at first glance. > Perhaps it's possible to support this encoding? > Notes > * I've not tested text extraction with PDFBox to see what codepoints are > generated, but Adobe Acrobat converts those codes to the box char (the > "unknown" char?) > * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built > in (TTF)" -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org