[ https://issues.apache.org/jira/browse/PDFBOX-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Constantine Dokolas updated PDFBOX-6004: ---------------------------------------- Description: I've encountered a PDF with a font named "SymbolMT" which defines its encoding as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a warning ({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) and multiple {{No Unicode mapping ...}} warnings. I couldn't find official documentation for this encoding, but the {{pdf.js}} project has support for this encoding implemented [here |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207] and it looks correct at first glance. Perhaps it's possible to support this encoding? Notes * I've not tested text extraction with PDFBox to see what codepoints are generated, but Adobe Acrobat converts those codes to the box char (the "unknown" char?) * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built in (TTF)" was: I've encountered a PDF with a font named "SymbolMT" which defines its encoding as `SymbolSetEncoding`. Using the debugger app, I get a warning (`Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding`) and multiple `No Unicode mapping ...` warnings. I couldn't find official documentation for this encoding, but the `pdf.js` project has support for this encoding implemented [here |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207] and it looks correct at first glance. Perhaps it's possible to support this encoding? Note: I've not tested text extraction with PDFBox to see what codepoints are generated, but Adobe Acrobat converts those codes to the box char (the "unknown" char?) > Support "SymbolSetEncoding" for fonts > ------------------------------------- > > Key: PDFBOX-6004 > URL: https://issues.apache.org/jira/browse/PDFBOX-6004 > Project: PDFBox > Issue Type: Improvement > Components: PDModel > Affects Versions: 2.0.33 > Reporter: Constantine Dokolas > Priority: Minor > > I've encountered a PDF with a font named "SymbolMT" which defines its > encoding as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a > warning ({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) > and multiple {{No Unicode mapping ...}} warnings. > I couldn't find official documentation for this encoding, but the {{pdf.js}} > project has support for this encoding implemented [here > |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207] > and it looks correct at first glance. > Perhaps it's possible to support this encoding? > Notes > * I've not tested text extraction with PDFBox to see what codepoints are > generated, but Adobe Acrobat converts those codes to the box char (the > "unknown" char?) > * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built > in (TTF)" -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org