Re: [HACKERS] Patch for collation using ICU

Palle Girgensohn Fri, 25 Mar 2005 05:25:09 -0800

--On fredag, mars 25, 2005 23.39.33 +1100 John Hansen <[EMAIL PROTECTED]> wrote:

Ok,.. tested on debian sarge with ICU 3.2
UNICODE Database, C locale.

upper() and lower() returns an empty string for any input, including
7bit ascii, regardless of client_encoding, so something is obviously
broken.

Have you tested this patch on a UNICODE DB with locale C/POSIX ?

No, honestly not. Mostly tested it with my needs, sv_SE.UTF-8 and UNICODE, and also de_DE.UTF-8.

How will PostgreSQL react to this combo? A database cluster initdb:ed with locale=C/POSIX, and then a database in UNICODE (really utf-8) representation... hmm... I think I might have made a false assumption that the locale string would contain the character encoding. I do something like encoding = strchr(locale, '.') + 1... That code will be confused by a 'C' locale, indeed. I'll check it out!

/Palle


... John

-----Original Message-----
From: John Hansen
Sent: Friday, March 25, 2005 10:27 PM
To: 'Palle Girgensohn'; '[email protected]'
Subject: RE: [HACKERS] Patch for collation using ICU

> --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen
> <[EMAIL PROTECTED]>
> wrote:
>
> > Useful if it's going to support earlier releases of ICU....
> >
> > Not all os's come with ICU3.2, debian for example,
> currently has 2.1
> > in testing, and 2.6 in unstable.
>
> Oh, OK. FreeBSD has only the 3.2 as port. I can check the older
> version, I doubt it would too much difference. Some
autoconf sorcery
> needed, perhaps.

Naww, it's no biggie, we'll just need to include ICU with pg I think.
I tried that, there are several functions from ICU that you
use, that are not in ICU2.1

Dono about 2.6.

However, ICU3.2 compiles on debian with a small change to the
debian/rules file.
debian/tmp/etc is missing, so add mkdir debian/tmp/etc

... John

>
> /Palle
>
> >
> > ... John
> >
> >> -----Original Message-----
> >> From: [EMAIL PROTECTED]
> >> [mailto:[EMAIL PROTECTED] On Behalf Of Palle
> >> Girgensohn
> >> Sent: Friday, March 25, 2005 10:40 AM
> >> To: [email protected]
> >> Subject: [HACKERS] Patch for collation using ICU
> >>
> >> Hi!
> >>
> >> I've put together a patch for using IBM's ICU package for
> collation.
> >>
> >> If your OS does not have full support for collation ur
> >> uppercase/lowercase in multibyte locales, this might be
useful. If
> >> you are using a multibyte character encoding in your
database and
> >> want collation, i.e. order by, and also lower(), upper() and
> >> initcap() to work properly, this patch will do just that.
> >>
> >> This patch is needed for FreeBSD, since this OS has no
support for
> >> collation of for example unicode locales (that is,
wcscoll(3) does
> >> not do what you expect if you set LC_ALL=sv_SE.UTF-8, for
> example).
> >> AFAIK the patch is *not* necessary for Linux, although IBM
> claims ICU
> >> collation to be about twice as fast as glibc for simple western
> >> locales.
> >>
> >> It adds a configure switch, `--with-icu', which will set
> up the code
> >> to use ICU instead of wchar_t and wcscoll.
> >>
> >> This has been tested only on FreeBSD-4.11 &
> FreeBSD-5-stable, where
> >> it seems to run well. I've not had the time to do any
comparative
> >> performance tests yet, but it seems it is at least not
slower than
> >> using LATIN1 with
> >> sv_SE.ISO8859-1 locale, perhaps even faster.
> >>
> >> I'd be delighted if some more experienced postgresql
hackers would
> >> review this stuff. The patch is pretty compact, so it's
> fast reading
> >> :)  I'm planning to add this patch as an option (tagged
> >> "experimental") to FreeBSD's postgresql port. Any ideas
> about whether
> >> this is a good idea or not?
> >>
> >> Any thoughts or ideas are welcome!
> >>
> >> Cheers,
> >> Palle
> >>
> >> Patch at:
> >> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
> > 005-03-14.diff>
> >>
> >> ICU at sourceforge: <http://icu.sf.net/>
> >>
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 7: don't forget to increase your free space map settings
> >>
> >>
>
>
>
>
>
>


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
     joining column's datatypes do not match

Re: [HACKERS] Patch for collation using ICU

Reply via email to