Package: evince Version: 3.4.0-3.1 Severity: minor Open the file <URL: http://www.madore.org/~david/.misc/astral.pdf > using evince: this contains a table of contents with a single entry whose name is the same as on the document itself, i.e., "The name " followed by the four characters U+10909 U+10904 U+10905 U+10904 (four phoenician letters, which are written right to left on the document but that's completely irrelevant here); when evince tries to display the table of contents, it fails with the following error message:
(evince:8324): Gtk-WARNING **: Failed to set text from markup due to error parsing markup: Error on line 1 char 42: Invalid UTF-8 encoded text in name - not valid 'The name \xed\xa0\x82\xed\xb4\x89\xed\xa0\x82\xed\xb4\x84\xed\xa0\x82\xed\xb4\x85\xed\xa0\x82\xed\xb4\x84' What this means is that some program or library somewhere (either within evince itself or within libpoppler or something - I haven't been able to discover which) took the Unicode string which in the PDF is (correctly) encoded as UTF-16 (the PDF is uncompressed so it can be easily checked that the encoding is as I state it): "\xfe\xff\x00\x54\x00\x68\x00\x65\x00\x20\x00\x6e\x00\x61\x00\x6d\x00\x65\x00\x20\xd8\x02\xdd\x09\xd8\x02\xdd\x04\xd8\x02\xdd\x05\xd8\x02\xdd\x04" and instead of converting it correctly to UTF-8 "The name \xf0\x90\xa4\x89\xf0\x90\xa4\x84\xf0\x90\xa4\x85\xf0\x90\xa4\x84" produced the octet stream "The name \xed\xa0\x82\xed\xb4\x89\xed\xa0\x82\xed\xb4\x84\xed\xa0\x82\xed\xb4\x85\xed\xa0\x82\xed\xb4\x84" which is not valid UTF-8 and rightfully rejected by the Gtk toolkit. The reason for this is obviously that some idiot thought that UTF-16 can be converted to UTF-8 by simply taking each UTF-16 translation unit separately and converting it to UTF-8, whereas in fact surrogate pairs (designating "astral" characters) must be handled together. (This error is sometimes known as CESU-8.) I've been unable to find who the culprit is (from a superficial glance, the code from both evince and libpoppler seems sane and calls iconv which is certainly not buggy itself), so I'm bugreporting against evince, which exhibits the bug. -- David A. Madore ( http://www.madore.org/~david/ ) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org