Tatsuo Ishii wrote:
I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the UCS-4 standard for more details)
The code point is simply a number. The result of chr() will be a text value one char (not one byte) wide, in the relevant database encoding.
U+nnnn maps to the same Unicode char and hence the same UTF8 encoding pattern regardless of endianness. e.g. U+00a9 is the copyright symbol on all machines. So to get this char in a UTF8 database you could call "select chr(169)" and get back the byte pattern \xC2A9.
Or are you going to represent Unicode point as a character string such as 'U+0259'? Then representing any encoding as a string could avoid endianness issues anyway, and I don't see Unicode code point is any better than others.
The argument will be a number, as now.
Also I'd like to point out all encodings has its own code point systems as far as I know. For example, EUC-JP has its corresponding code point systems, ASCII, JIS X 0208 and JIS X 0212. So I don't see we can't use "code point" as chr()'s argument for othe encodings(of course we need optional parameter specifying which character set is supposed).
Where can I find the tables that map code points (as opposed to encodings) to characters for these others?
cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend