> -----Original Message-----
> From: Palle Girgensohn [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 26, 2005 1:10 PM
> To: pgsql-hackers@postgresql.org
> Cc: John Hansen; Andrew Dunstan
> Subject: Re: [HACKERS] Patch for collation using ICU
> 
> --On fredag, mars 25, 2005 00.40.04 +0100 Palle Girgensohn 
> <[EMAIL PROTECTED]> wrote:
> 
> > Hi!
> >
> > I've put together a patch for using IBM's ICU package for collation.
> >
> > If your OS does not have full support for collation ur 
> > uppercase/lowercase in multibyte locales, this might be 
> useful. If you 
> > are using a multibyte character encoding in your database and want 
> > collation, i.e. order by, and also lower(), upper() and 
> initcap() to 
> > work properly, this patch will do just that.
> >
> > This patch is needed for FreeBSD, since this OS has no support for 
> > collation of for example unicode locales (that is, 
> wcscoll(3) does not 
> > do what you expect if you set LC_ALL=sv_SE.UTF-8, for 
> example). AFAIK 
> > the patch is *not* necessary for Linux, although IBM claims ICU 
> > collation to be about twice as fast as glibc for simple 
> western locales.
> >
> > It adds a configure switch, `--with-icu', which will set up 
> the code 
> > to use ICU instead of wchar_t and wcscoll.
> >
> > This has been tested only on FreeBSD-4.11 & 
> FreeBSD-5-stable, where it 
> > seems to run well. I've not had the time to do any comparative 
> > performance tests yet, but it seems it is at least not slower than 
> > using
> > LATIN1 with sv_SE.ISO8859-1 locale, perhaps even faster.
> >
> > I'd be delighted if some more experienced postgresql hackers would 
> > review this stuff. The patch is pretty compact, so it's 
> fast reading 
> > :)  I'm planning to add this patch as an option (tagged 
> > "experimental") to FreeBSD's postgresql port. Any ideas 
> about whether 
> > this is a good idea or not?
> >
> > Any thoughts or ideas are welcome!
> >
> > Cheers,
> > Palle
> >
> > Patch at:
> > 
> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2005-03-1
> > 4.d
> > iff>
> >
> > ICU at sourceforge: <http://icu.sf.net/>
> 
> 
> Hi!
> 
> There's a new patch to fix some reported problems.
> 
> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
005-03-26.diff>
> 
> This version uses the DatabaseEncoding and sets the ICU 
> encoding at the same time. I had to create a conversion table 
> from PostgreSQL's own, somewhat odd and non-standard, names 
> of encodings, into the prefered IANA names. On or two of the 
> more odd ones might be slightly incorrect, hopefully not too 
> far off anyway?
> 
> I've noticed a couple of things about using the ICU patch vs. pristine
> pg-8.0.1:
> 
> - ORDER BY is case insensitive when using ICU. This might 
> break the SQL standard (?), but sure is nice :)

This would mean that indexes are also case insensitive right?
Which makes it a Bad Thing(tm).

> - When the database is initialized using the C locale, 
> upper() and lower() normally does not work at all for 
> non-ASCII characters even if the database's encoding is say 
> LATIN1 or UNICODE. (does not work for me anyway, on FreeBSD, 
> and this is probably correct since the locale is still `C', I 
> believe?). The ICU patch changes nothing for the LATIN1 case, 
> since it does not act on single byte encodings, but for the 
> UNICODE representation, it works and does what I expect it 
> to, namely upper() and lower() neatly
> upper- or lowercase diacritical characters, i.e. lower('ÅÄÖ') 
> -> 'åäö'. 
> This is a good thing, although I'm surprised that upper/lower 
> is dragged along with the LC_COLLATE fixation at initdb. I 
> never run initdb in the C locale, but only now do I realize 
> how broken that really is if you need to store anything else 
> than English :-)

That is what I would have expected. However, it probably won't work for the 
more exotic cases, like turkish I, which depends on the locale.

> 
> I'd be delighted to get more feedback about this stuff.
> 
> Thanks,
> Palle
> 
> 
> 

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to