--On fredag, mars 25, 2005 23.39.33 +1100 John Hansen <[EMAIL PROTECTED]> wrote:
Ok,.. tested on debian sarge with ICU 3.2 UNICODE Database, C locale.
upper() and lower() returns an empty string for any input, including 7bit ascii, regardless of client_encoding, so something is obviously broken.
Have you tested this patch on a UNICODE DB with locale C/POSIX ?
No, honestly not. Mostly tested it with my needs, sv_SE.UTF-8 and UNICODE, and also de_DE.UTF-8.
How will PostgreSQL react to this combo? A database cluster initdb:ed with locale=C/POSIX, and then a database in UNICODE (really utf-8) representation... hmm... I think I might have made a false assumption that the locale string would contain the character encoding. I do something like encoding = strchr(locale, '.') + 1... That code will be confused by a 'C' locale, indeed. I'll check it out!
/Palle
... John
-----Original Message----- From: John Hansen Sent: Friday, March 25, 2005 10:27 PM To: 'Palle Girgensohn'; 'pgsql-hackers@postgresql.org' Subject: RE: [HACKERS] Patch for collation using ICU
> --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen > <[EMAIL PROTECTED]> > wrote: > > > Useful if it's going to support earlier releases of ICU.... > > > > Not all os's come with ICU3.2, debian for example, > currently has 2.1 > > in testing, and 2.6 in unstable. > > Oh, OK. FreeBSD has only the 3.2 as port. I can check the older > version, I doubt it would too much difference. Some autoconf sorcery > needed, perhaps.
Naww, it's no biggie, we'll just need to include ICU with pg I think. I tried that, there are several functions from ICU that you use, that are not in ICU2.1
Dono about 2.6.
However, ICU3.2 compiles on debian with a small change to the debian/rules file. debian/tmp/etc is missing, so add mkdir debian/tmp/etc
... John
> > /Palle > > > > > ... John > > > >> -----Original Message----- > >> From: [EMAIL PROTECTED] > >> [mailto:[EMAIL PROTECTED] On Behalf Of Palle > >> Girgensohn > >> Sent: Friday, March 25, 2005 10:40 AM > >> To: pgsql-hackers@postgresql.org > >> Subject: [HACKERS] Patch for collation using ICU > >> > >> Hi! > >> > >> I've put together a patch for using IBM's ICU package for > collation. > >> > >> If your OS does not have full support for collation ur > >> uppercase/lowercase in multibyte locales, this might be useful. If > >> you are using a multibyte character encoding in your database and > >> want collation, i.e. order by, and also lower(), upper() and > >> initcap() to work properly, this patch will do just that. > >> > >> This patch is needed for FreeBSD, since this OS has no support for > >> collation of for example unicode locales (that is, wcscoll(3) does > >> not do what you expect if you set LC_ALL=sv_SE.UTF-8, for > example). > >> AFAIK the patch is *not* necessary for Linux, although IBM > claims ICU > >> collation to be about twice as fast as glibc for simple western > >> locales. > >> > >> It adds a configure switch, `--with-icu', which will set > up the code > >> to use ICU instead of wchar_t and wcscoll. > >> > >> This has been tested only on FreeBSD-4.11 & > FreeBSD-5-stable, where > >> it seems to run well. I've not had the time to do any comparative > >> performance tests yet, but it seems it is at least not slower than > >> using LATIN1 with > >> sv_SE.ISO8859-1 locale, perhaps even faster. > >> > >> I'd be delighted if some more experienced postgresql hackers would > >> review this stuff. The patch is pretty compact, so it's > fast reading > >> :) I'm planning to add this patch as an option (tagged > >> "experimental") to FreeBSD's postgresql port. Any ideas > about whether > >> this is a good idea or not? > >> > >> Any thoughts or ideas are welcome! > >> > >> Cheers, > >> Palle > >> > >> Patch at: > >> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2 > > 005-03-14.diff> > >> > >> ICU at sourceforge: <http://icu.sf.net/> > >> > >> > >> ---------------------------(end of > >> broadcast)--------------------------- > >> TIP 7: don't forget to increase your free space map settings > >> > >> > > > > > >
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match