Hello there, I'm encountering an error in how certain characters are encoded using PDFBox. The issue exists in all versions of PDFBox, but I'm currently using 3.0.1.
contentStream.showText("äöüß"); The string "äöüß" is used as a test for Unicode characters that PDFBox needs to render. var resource = Processor.class.getResource("/OpenSans-Regular.ttf"); var file = Paths.get(resource.toURI()).toFile(); var targetStream = new FileInputStream(file); var out = PDType0Font.load(PageAssembler.getDocument(), targetStream, false); contentStream.setFont(out, 20); To do so, I'm importing a font that I know has the glyphs for all four special characters (OpenSans downloaded from Google Fonts). However, this issue can be reproduced using any other Unicode-supported font. Executing the code, PDFBox renders the following character sequence: äöüß. Clearly an encoding issue. Using the PDF Debugger, it shows the text rendered as: /F1 20 Tf BT (\000\205\000f\000\205\000x\000\205\000~\000\205\0019) Tj ET Now, as far as I understand from what I've learned while debugging this issue, \205 is the octal value that uses the glyph at position 133 (decimal for \205) of the font with the id F1. Again, looking at the F1 section in the PDF Debugger, the character listed under the code / CID / GID 133 is indeed Ã, the first "incorrect" character of the sequence, which is supposed to be "ä" "ä", however, would be 166, not 133. How does PDFBox get this wrong? As an aside, if I use showText and use toUnicode(166), PDFBox correctly renders "ä" in the desired font! Looking at the "ToUnicode" part of the F1 font, the following string is displayed. Could someone please help me figure out what is going on? And hopefully even help me fix this issue? For more help, I have attached the PDF document. Best, Gino ToUnicode: /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 100 beginbfrange <0001> <0001> <0000> <0002> <0002> <000D> <0003> <0061> <0020> <0062> <00C1> <00A0> <00C2> <00F2> <0100> <00F3> <00FF> <0132> <0100> <0122> <013F> <0123> <0124> <021A> <0125> <0140> <0164> <0141> <0141> <0192> <0142> <0147> <01FA> <0148> <0149> <0218> <014A> <014B> <02C6> <014C> <014C> <02C9> <014D> <0152> <02D8> <0153> <0159> <0384> <015A> <015A> <038C> <015B> <016E> <038E> <016F> <019A> <03A3> <019B> <01A6> <0401> <01A7> <01E8> <040E> <01E9> <01F4> <0451> <01F5> <01F6> <045E> <01F7> <01F8> <0490> <01F9> <01FE> <1E80> <01FF> <01FF> <1EF2> <0200> <0200> <1EF3> <0201> <0203> <2013> <0204> <020B> <2017> <020C> <020E> <2020> <020F> <020F> <2026> <0210> <0210> <2030> <0211> <0212> <2032> <0213> <0214> <2039> <0215> <0215> <203C> <0216> <0216> <2044> <0217> <0217> <207F> <0218> <0219> <20A3> <021A> <021A> <20A7> <021B> <021B> <20AC> <021C> <021C> <2105> <021D> <021D> <2113> <021E> <021E> <2116> <021F> <021F> <2122> <0220> <0220> <2126> <0221> <0221> <212E> <0222> <0225> <215B> <0226> <0226> <2202> <0227> <0227> <2206> <0228> <0228> <220F> <0229> <022A> <2211> <022B> <022B> <221A> <022C> <022C> <221E> <022D> <022D> <222B> <022E> <022E> <2248> <022F> <022F> <2260> <0230> <0231> <2264> <0232> <0232> <25CA> <0235> <0235> <0326> <0237> <0238> <2074> <0239> <023A> <2077> <023B> <0246> <2000> <0247> <0247> <FEFF> <0248> <0249> <FFFC> <024A> <024A> <01F0> <024B> <024B> <02BC> <024C> <024D> <03D1> <024E> <024E> <03D6> <024F> <0250> <1E3E> <0251> <0252> <1E00> <0253> <0253> <02F3> <0254> <0255> <01A0> <0256> <0257> <01AF> <0259> <0259> <0400> <025A> <025A> <040D> <025B> <025B> <0450> <025C> <025C> <045D> <025D> <027F> <0460> <0280> <0287> <0488> <0288> <02F5> <0492> <02F6> <02FF> <0500> <0300> <0309> <050A> <030A> <035B> <1EA0> <035C> <0361> <1EF4> <0362> <0362> <20AB> <036D> <036E> <0162> <036F> <0372> <01EA> <0373> <0373> <0259> <0374> <0374> <0309> <0375> <0375> <1F4D> <0376> <0376> <1FDE> <0377> <0377> <2070> <0378> <0378> <2076> <0379> <0379> <2079> <038A> <038E> <FB00> <038F> <038F> <1E9E> <0390> <0391> <A7B3> <03AF> <03AF> <0131> <03B0> <03B0> <0237> <03B1> <03B1> <A7B5> endbfrange 35 beginbfrange <03B2> <03B2> <AB53> <03C1> <03C8> <2095> <03C9> <03E3> <05D0> <03E4> <03F0> <FB2A> <03F1> <03F5> <FB38> <03F6> <03F6> <FB3E> <03F7> <03F8> <FB40> <03F9> <03FA> <FB43> <03FB> <03FF> <FB46> <0400> <0400> <FB4B> <0401> <0405> <0300> <0406> <0408> <0306> <0409> <040B> <030A> <040C> <040C> <030F> <040D> <040D> <0312> <040E> <040E> <0323> <040F> <0410> <0327> <0411> <0412> <0485> <0413> <0414> <0483> <0415> <0422> <05B0> <0423> <0424> <05C1> <0425> <0425> <05C7> <0459> <0462> <2080> <0463> <0463> <05BE> <0464> <0464> <207D> <0465> <0465> <208D> <0466> <0466> <207E> <0467> <0467> <208E> <0468> <0468> <207A> <0469> <0469> <207C> <046A> <046A> <208A> <046B> <046B> <208C> <046C> <046C> <2215> <046D> <046D> <20AA> <046E> <046E> <2120> endbfrange endcmap CMapName currentdict /CMap defineresource pop end end -- *Gino*
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org