Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes: >> Agreed, but such corruption indicates that there is non-multibyte-safe >> (octet-wise) case conversion somewhere, at best (with fully working >> locale) it will cause case conversion to do nothing instead of actual >> conversion. > > Yours is the first insta

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Tom Lane
Victor Snezhko <[EMAIL PROTECTED]> writes: > Agreed, but such corruption indicates that there is non-multibyte-safe > (octet-wise) case conversion somewhere, at best (with fully working > locale) it will cause case conversion to do nothing instead of actual > conversion. Yours is the first install

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes: >> correct utf-8 byte sequence is 0xd18231, so it looks like we call >> tolower() somewhere on parts of multibyte characters, and it does the >> same as isspace() - it interprets it's argument as wide character, and >> converts it. > > Indeed, and I am certainl

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Tom Lane
Victor Snezhko <[EMAIL PROTECTED]> writes: > correct utf-8 byte sequence is 0xd18231, so it looks like we call > tolower() somewhere on parts of multibyte characters, and it does the > same as isspace() - it interprets it's argument as wide character, and > converts it. Indeed, and I am certainly

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Victor Snezhko <[EMAIL PROTECTED]> writes: > So, we either don't support utf-8 on BSDs Hmm, tolower'ing octets of a multibyte string is a bug not only on BSDs but on other architectures as well. But on BSDs it additionally causes corruption of utf-8 data. -- WBR, Victor V. Snezhko E-mail: [EMAI