[Steve Holden] >>"Wider than UTF-16" doesn't make sense. [Ross Ridge] > It makes perfect sense.
No it doesn't. UTF-16 is a "Unicode Transcription Format", meaning that it is a mechanism for representing all unicode code points, even the ones with ordinals greater than 0xFFFF, using series of 16-bit values. http://en.wikipedia.org/wiki/UTF-16 """ UTF-16 represents a character above hexadecimal FFFF as a surrogate pair of code values from the range D800-DFFF. For example, the character at code point hexadecimal 10000 becomes the code value sequence D800 DC00, and the character at hexadecimal 10FFFD, the upper limit of Unicode, becomes the code value sequence DBFF DFFD. Unicode and ISO/IEC 10646 do not assign characters to any of the code points in the D800-DFFF range, so an individual code value from a surrogate pair does not ever represent a character. """ So UTF-16 has no "width" to compare to, no more than utf-8 does. I wonder what character set the OP is dealing with, if it's not representable with Unicode. Presumably it's not a modern character set? -- alan kennedy ------------------------------------------------------ email alan: http://xhaus.com/contact/alan -- http://mail.python.org/mailman/listinfo/python-list