Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

Tom Lane Sat, 23 Sep 2006 09:38:59 -0700

Victor Snezhko <[EMAIL PROTECTED]> writes:
> correct utf-8 byte sequence is 0xd18231, so it looks like we call
> tolower() somewhere on parts of multibyte characters, and it does the
> same as isspace() - it interprets it's argument as wide character, and
> converts it.


Indeed, and I am certainly wondering why we should not just say that
you've got a broken locale definition there.  There is absolutely no
doubt that the ctype.h functions are defined to work on char, not wchar.
They have no business mangling high-bit-set bytes in a multibyte
encoding.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

Reply via email to