Hello there,

I'm encountering an error in how certain characters are encoded using
PDFBox. The issue exists in all versions of PDFBox, but I'm currently using
3.0.1.

contentStream.showText("äöüß");


The string "äöüß" is used as a test for Unicode characters that PDFBox
needs to render.

var resource = Processor.class.getResource("/OpenSans-Regular.ttf");
var file = Paths.get(resource.toURI()).toFile();
var targetStream = new FileInputStream(file);
var out = PDType0Font.load(PageAssembler.getDocument(), targetStream, false);
contentStream.setFont(out, 20);


To do so, I'm importing a font that I know has the glyphs for all four
special characters (OpenSans downloaded from Google Fonts).
However, this issue can be reproduced using any other Unicode-supported
font.

Executing the code, PDFBox renders the following character
sequence: äöüß.
Clearly an encoding issue.

Using the PDF Debugger, it shows the text rendered as:

/F1 20 Tf
BT
  (\000\205\000f\000\205\000x\000\205\000~\000\205\0019) Tj
ET

Now, as far as I understand from what I've learned while debugging this
issue, \205 is the octal value that uses the glyph at position 133 (decimal
for \205) of the font with the id F1.
Again, looking at the F1 section in the PDF Debugger, the character listed
under the code / CID / GID 133 is indeed Ã, the first "incorrect" character
of the sequence, which is supposed to be "ä"
"ä", however, would be 166, not 133. How does PDFBox get this wrong?

As an aside, if I use showText and use toUnicode(166), PDFBox correctly
renders "ä" in the desired font!

Looking at the "ToUnicode" part of the F1 font, the following string is
displayed.

Could someone please help me figure out what is going on? And hopefully
even help me fix this issue? For more help, I have attached the PDF
document.

Best,
Gino

ToUnicode:

/CIDInit /ProcSet findresource begin
12 dict begin

begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def

/CMapName /Adobe-Identity-UCS def
/CMapType 2 def

1 begincodespacerange
<0000> <FFFF>
endcodespacerange

100 beginbfrange
<0001> <0001> <0000>
<0002> <0002> <000D>
<0003> <0061> <0020>
<0062> <00C1> <00A0>
<00C2> <00F2> <0100>
<00F3> <00FF> <0132>
<0100> <0122> <013F>
<0123> <0124> <021A>
<0125> <0140> <0164>
<0141> <0141> <0192>
<0142> <0147> <01FA>
<0148> <0149> <0218>
<014A> <014B> <02C6>
<014C> <014C> <02C9>
<014D> <0152> <02D8>
<0153> <0159> <0384>
<015A> <015A> <038C>
<015B> <016E> <038E>
<016F> <019A> <03A3>
<019B> <01A6> <0401>
<01A7> <01E8> <040E>
<01E9> <01F4> <0451>
<01F5> <01F6> <045E>
<01F7> <01F8> <0490>
<01F9> <01FE> <1E80>
<01FF> <01FF> <1EF2>
<0200> <0200> <1EF3>
<0201> <0203> <2013>
<0204> <020B> <2017>
<020C> <020E> <2020>
<020F> <020F> <2026>
<0210> <0210> <2030>
<0211> <0212> <2032>
<0213> <0214> <2039>
<0215> <0215> <203C>
<0216> <0216> <2044>
<0217> <0217> <207F>
<0218> <0219> <20A3>
<021A> <021A> <20A7>
<021B> <021B> <20AC>
<021C> <021C> <2105>
<021D> <021D> <2113>
<021E> <021E> <2116>
<021F> <021F> <2122>
<0220> <0220> <2126>
<0221> <0221> <212E>
<0222> <0225> <215B>
<0226> <0226> <2202>
<0227> <0227> <2206>
<0228> <0228> <220F>
<0229> <022A> <2211>
<022B> <022B> <221A>
<022C> <022C> <221E>
<022D> <022D> <222B>
<022E> <022E> <2248>
<022F> <022F> <2260>
<0230> <0231> <2264>
<0232> <0232> <25CA>
<0235> <0235> <0326>
<0237> <0238> <2074>
<0239> <023A> <2077>
<023B> <0246> <2000>
<0247> <0247> <FEFF>
<0248> <0249> <FFFC>
<024A> <024A> <01F0>
<024B> <024B> <02BC>
<024C> <024D> <03D1>
<024E> <024E> <03D6>
<024F> <0250> <1E3E>
<0251> <0252> <1E00>
<0253> <0253> <02F3>
<0254> <0255> <01A0>
<0256> <0257> <01AF>
<0259> <0259> <0400>
<025A> <025A> <040D>
<025B> <025B> <0450>
<025C> <025C> <045D>
<025D> <027F> <0460>
<0280> <0287> <0488>
<0288> <02F5> <0492>
<02F6> <02FF> <0500>
<0300> <0309> <050A>
<030A> <035B> <1EA0>
<035C> <0361> <1EF4>
<0362> <0362> <20AB>
<036D> <036E> <0162>
<036F> <0372> <01EA>
<0373> <0373> <0259>
<0374> <0374> <0309>
<0375> <0375> <1F4D>
<0376> <0376> <1FDE>
<0377> <0377> <2070>
<0378> <0378> <2076>
<0379> <0379> <2079>
<038A> <038E> <FB00>
<038F> <038F> <1E9E>
<0390> <0391> <A7B3>
<03AF> <03AF> <0131>
<03B0> <03B0> <0237>
<03B1> <03B1> <A7B5>
endbfrange

35 beginbfrange
<03B2> <03B2> <AB53>
<03C1> <03C8> <2095>
<03C9> <03E3> <05D0>
<03E4> <03F0> <FB2A>
<03F1> <03F5> <FB38>
<03F6> <03F6> <FB3E>
<03F7> <03F8> <FB40>
<03F9> <03FA> <FB43>
<03FB> <03FF> <FB46>
<0400> <0400> <FB4B>
<0401> <0405> <0300>
<0406> <0408> <0306>
<0409> <040B> <030A>
<040C> <040C> <030F>
<040D> <040D> <0312>
<040E> <040E> <0323>
<040F> <0410> <0327>
<0411> <0412> <0485>
<0413> <0414> <0483>
<0415> <0422> <05B0>
<0423> <0424> <05C1>
<0425> <0425> <05C7>
<0459> <0462> <2080>
<0463> <0463> <05BE>
<0464> <0464> <207D>
<0465> <0465> <208D>
<0466> <0466> <207E>
<0467> <0467> <208E>
<0468> <0468> <207A>
<0469> <0469> <207C>
<046A> <046A> <208A>
<046B> <046B> <208C>
<046C> <046C> <2215>
<046D> <046D> <20AA>
<046E> <046E> <2120>
endbfrange

endcmap
CMapName currentdict /CMap defineresource pop
end
end

-- 
*Gino*
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to