> -----Original Message----- > From: Palle Girgensohn [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 26, 2005 1:10 PM > To: pgsql-hackers@postgresql.org > Cc: John Hansen; Andrew Dunstan > Subject: Re: [HACKERS] Patch for collation using ICU > > --On fredag, mars 25, 2005 00.40.04 +0100 Palle Girgensohn > <[EMAIL PROTECTED]> wrote: > > > Hi! > > > > I've put together a patch for using IBM's ICU package for collation. > > > > If your OS does not have full support for collation ur > > uppercase/lowercase in multibyte locales, this might be > useful. If you > > are using a multibyte character encoding in your database and want > > collation, i.e. order by, and also lower(), upper() and > initcap() to > > work properly, this patch will do just that. > > > > This patch is needed for FreeBSD, since this OS has no support for > > collation of for example unicode locales (that is, > wcscoll(3) does not > > do what you expect if you set LC_ALL=sv_SE.UTF-8, for > example). AFAIK > > the patch is *not* necessary for Linux, although IBM claims ICU > > collation to be about twice as fast as glibc for simple > western locales. > > > > It adds a configure switch, `--with-icu', which will set up > the code > > to use ICU instead of wchar_t and wcscoll. > > > > This has been tested only on FreeBSD-4.11 & > FreeBSD-5-stable, where it > > seems to run well. I've not had the time to do any comparative > > performance tests yet, but it seems it is at least not slower than > > using > > LATIN1 with sv_SE.ISO8859-1 locale, perhaps even faster. > > > > I'd be delighted if some more experienced postgresql hackers would > > review this stuff. The patch is pretty compact, so it's > fast reading > > :) I'm planning to add this patch as an option (tagged > > "experimental") to FreeBSD's postgresql port. Any ideas > about whether > > this is a good idea or not? > > > > Any thoughts or ideas are welcome! > > > > Cheers, > > Palle > > > > Patch at: > > > <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2005-03-1 > > 4.d > > iff> > > > > ICU at sourceforge: <http://icu.sf.net/> > > > Hi! > > There's a new patch to fix some reported problems. > > <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2 005-03-26.diff> > > This version uses the DatabaseEncoding and sets the ICU > encoding at the same time. I had to create a conversion table > from PostgreSQL's own, somewhat odd and non-standard, names > of encodings, into the prefered IANA names. On or two of the > more odd ones might be slightly incorrect, hopefully not too > far off anyway? > > I've noticed a couple of things about using the ICU patch vs. pristine > pg-8.0.1: > > - ORDER BY is case insensitive when using ICU. This might > break the SQL standard (?), but sure is nice :)
This would mean that indexes are also case insensitive right? Which makes it a Bad Thing(tm). > - When the database is initialized using the C locale, > upper() and lower() normally does not work at all for > non-ASCII characters even if the database's encoding is say > LATIN1 or UNICODE. (does not work for me anyway, on FreeBSD, > and this is probably correct since the locale is still `C', I > believe?). The ICU patch changes nothing for the LATIN1 case, > since it does not act on single byte encodings, but for the > UNICODE representation, it works and does what I expect it > to, namely upper() and lower() neatly > upper- or lowercase diacritical characters, i.e. lower('ÅÄÖ') > -> 'åäö'. > This is a good thing, although I'm surprised that upper/lower > is dragged along with the LC_COLLATE fixation at initdb. I > never run initdb in the C locale, but only now do I realize > how broken that really is if you need to store anything else > than English :-) That is what I would have expected. However, it probably won't work for the more exotic cases, like turkish I, which depends on the locale. > > I'd be delighted to get more feedback about this stuff. > > Thanks, > Palle > > > ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly