Chet Ramey <chet.ra...@case.edu> writes: > On 5/10/11 9:17 AM, Greg Wooledge wrote: > >> In yours, however, it is 0x65 0xcc 0x81 which is U+0065 LATIN SMALL >> LETTER E followed by U+0301 COMBINING ACUTE ACCENT. > > That's not valid UTF-8, since UTF-8 requires that the shortest sequence > be used to encode a character.
0x65 0xcc 0x81 is the correct UTF-8 encoding for the two character sequence U+0065 U+0301. > The general problem with combining > characters still exists (the one in the message I referenced in an > earlier reply), but this case has more to do with Mac OS X and its use > of both precomposed and decomposed UTF-8 than anything. There is no such thing as "precomposed UTF-8" and "decomposed UTF-8". UTF-8 is an encoding of Unicode, and both NFD and NFC are valid forms of Unicode. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."