https://bz.apache.org/bugzilla/show_bug.cgi?id=69667
Dominik Stadler <dominik.stad...@gmx.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO --- Comment #3 from Dominik Stadler <dominik.stad...@gmx.at> --- Thanks for the investigation and details! The relevant format-specification is at https://msopenspecs.microsoft.com/files/MS-XLS/%5bMS-XLS%5d.pdf in the following sections: * 2.4.349 WriteAccess * 2.5.294 XLUnicodeString Relevant part are * cch (2 bytes): An unsigned integer that specifies the count of characters in the string. * fHighByte (1 bit): A bit that specifies whether the characters in rgb are double-byte characters. * rgb (variable): An array of bytes that specifies the characters. If fHighByte is 0x0, the size of the array MUST be equal to cch. If fHighByte is 0x1, the size of the array MUST be equal to cch*2. When running your test, I see the following: * cch is 81 * fHighByte is 1 * rgb is the string encoded in UTF-16 According to the spec the code tries to read 81 "characters", as UTF-16 is enabled, this maps to 2*81 = 162 "bytes"! So the byte "51" in your byte-sequence seems wrong, it should be 28, i.e. "1C", to properly state the count of "characters", not "bytes". If I replace "51" with "1C" in your test, it works without error. So it seems the tool creating the file is producing an incorrect format. Unfortunately sometimes Excel and others "gracefully" handle such invalid format, making it actually harder to force third parties to produce valid documents. Can you get this adjusted in the tool that is used to produce the file? Or switch to the newer XSSF/.xlsx-format which is much less prone to such small format-differences? -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org