[jira] [Updated] (PDFBOX-6004) Support "SymbolSetEncoding" for fonts

Constantine Dokolas (Jira) Thu, 08 May 2025 05:03:23 -0700


     [ 
https://issues.apache.org/jira/browse/PDFBOX-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Constantine Dokolas updated PDFBOX-6004:
----------------------------------------
    Description: 
I've encountered a PDF with a font named "SymbolMT" which defines its encoding 
as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a warning 
({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) and 
multiple {{No Unicode mapping ...}} warnings.

I couldn't find official documentation for this encoding, but the {{pdf.js}} 
project has support for this encoding implemented [here 
|https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
 and it looks correct at first glance.

Perhaps it's possible to support this encoding?

Notes
 * I've not tested text extraction with PDFBox to see what codepoints are 
generated, but Adobe Acrobat converts those codes to the box char (the 
"unknown" char?)
 * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built in 
(TTF)"

  was:
I've encountered a PDF with a font named "SymbolMT" which defines its encoding 
as `SymbolSetEncoding`. Using the debugger app, I get a warning (`Warning 
[PDSimpleFont] Unknown encoding: SymbolSetEncoding`) and multiple `No Unicode 
mapping ...` warnings.

I couldn't find official documentation for this encoding, but the `pdf.js` 
project has support for this encoding implemented [here 
|https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
 and it looks correct at first glance.

Perhaps it's possible to support this encoding?

Note: I've not tested text extraction with PDFBox to see what codepoints are 
generated, but Adobe Acrobat converts those codes to the box char (the 
"unknown" char?)


> Support "SymbolSetEncoding" for fonts
> -------------------------------------
>
>                 Key: PDFBOX-6004
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6004
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.33
>            Reporter: Constantine Dokolas
>            Priority: Minor
>
> I've encountered a PDF with a font named "SymbolMT" which defines its 
> encoding as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a 
> warning ({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) 
> and multiple {{No Unicode mapping ...}} warnings.
> I couldn't find official documentation for this encoding, but the {{pdf.js}} 
> project has support for this encoding implemented [here 
> |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
>  and it looks correct at first glance.
> Perhaps it's possible to support this encoding?
> Notes
>  * I've not tested text extraction with PDFBox to see what codepoints are 
> generated, but Adobe Acrobat converts those codes to the box char (the 
> "unknown" char?)
>  * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built 
> in (TTF)"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-6004) Support "SymbolSetEncoding" for fonts

Reply via email to