Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Andrew - Supernews
On 2005-04-10, "John Hansen" <[EMAIL PROTECTED]> wrote: > That's right, dono how I missed that one, but looks correct to me, and > is in line with the code in ConvertUTF.c from unicode.org, on which I > based the patch, extended to support 6 byte utf8 characters. Frankly, you should probably de-ex

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Oliver Jowett
Tom Lane wrote: > Yeah? Cool. Does John's proposed patch do it "correctly"? > > http://candle.pha.pa.us/mhonarc/patches2/msg00076.html Some comments on that patch: Doesn't pg_utf2wchar_with_len need changes for the longer sequences? UtfToLocal also appears to need changes. If we support seq

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread John Hansen
>On 2005-04-10, Tom Lane wrote: >> Andrew - Supernews writes: >>> I think you will find that this impression is actually false. Or that at >>> the very least, _correct_ verification of UTF-8 sequences will still >>> catch essentially all cases of non-utf-8 input mislabelled as utf-8 >>> while all

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Andrew - Supernews
On 2005-04-10, Tom Lane <[EMAIL PROTECTED]> wrote: > Andrew - Supernews <[EMAIL PROTECTED]> writes: >> I think you will find that this impression is actually false. Or that at >> the very least, _correct_ verification of UTF-8 sequences will still >> catch essentially all cases of non-utf-8 input m

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Tom Lane
Andrew - Supernews <[EMAIL PROTECTED]> writes: > On 2005-04-10, Tom Lane <[EMAIL PROTECTED]> wrote: >> The impression I get is that most of the 'Unicode characters above >> 0x1' reports we've seen did not come from people who actually needed >> more-than-16-bit Unicode codepoints, but from peop

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Andrew - Supernews
On 2005-04-10, Tom Lane <[EMAIL PROTECTED]> wrote: > The impression I get is that most of the 'Unicode characters above > 0x1' reports we've seen did not come from people who actually needed > more-than-16-bit Unicode codepoints, but from people who had screwed up > their encoding settings and

Re: [HACKERS] Unicode problems on IRC

2005-04-09 Thread Bruce Momjian
Tom Lane wrote: > "John Hansen" <[EMAIL PROTECTED]> writes: > >> That is backpatched to 8.0.X. Does that not fix the problem reported? > > > No, as andrew said, what this patch does, is allow values > 0x and > > at the same time validates the input to make sure it's valid utf8. > > The impre

Re: [HACKERS] Unicode problems on IRC

2005-04-09 Thread Tom Lane
"John Hansen" <[EMAIL PROTECTED]> writes: >> That is backpatched to 8.0.X. Does that not fix the problem reported? > No, as andrew said, what this patch does, is allow values > 0x and > at the same time validates the input to make sure it's valid utf8. The impression I get is that most of th

Re: [HACKERS] Unicode problems on IRC

2005-04-09 Thread John Hansen
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian > Sent: Sunday, April 10, 2005 8:18 AM > To: Christopher Kings-Lynne > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Unicode problems on IRC > &

Re: [HACKERS] Unicode problems on IRC

2005-04-09 Thread Andrew - Supernews
On 2005-04-09, Bruce Momjian wrote: > Uh, I thought we fixed this another way, buy not using Unicode-aware > functions for upper/lower/initcap when the locale is "C" or "POSIX". > That is backpatched to 8.0.X. Does that not fix the problem reported? Unicode values over 0x are simply not acc

Re: [HACKERS] Unicode problems on IRC

2005-04-09 Thread Bruce Momjian
Christopher Kings-Lynne wrote: > Hey guys, > > The 'Unicode characters above 0x1' issue keeps rearing its ugly head > in the IRC channel. I propose that it be fixed, even backported... > > This is John Hansen's most recent patch to fix it: > > http://archives.postgresql.org/pgsql-patches/2