Re: [HACKERS] Patch for collation using ICU

Palle Girgensohn Sat, 07 May 2005 07:24:26 -0700

--On lördag, maj 07, 2005 10.06.43 -0400 Bruce Momjian <pgman@candle.pha.pa.us> wrote:

Palle Girgensohn wrote:


--On l?rdag, maj 07, 2005 23.15.29 +1000 John Hansen
<[EMAIL PROTECTED]>  wrote:

> Btw, I had been planning to propose replacing every single one of the
> built in charset conversion functions with calls to ICU (thus making pg
> _depend_ on ICU), as this would seem like a cleaner solution than for
> us to maintain our own conversion tables.
>
> ICU also has a fair few conversions that we do not have at present.


That is a much larger issue, similar to our shipping our own timezone
database.  What does it buy us?

        o  Do we ship it in our tarball?
        o  Is the license compatible?


It looks pretty similar to BSD, although I'm a novice on the subject.

o Does it remove utils/mb conversions?


Yes, it would probably be possible to remove pg's own conversions.

o Does it allow us to index LIKE (next high char)?


I beleive so, using ICU's substring stuff.

        o  Does it allow us to support multiple encodings in
           a single database easier?


Heh, the ultimate dream. Perhaps?

o performance?

ICU in itself is said to be much faster than for example glibc. Problem is the need for conversion via UTF-16, which requires extra memory allocations and cpu cycles. I don't use glibc, but my very simple performance tests for FreeBSD show that it is similiar in speed.

I just had a similar though. And why use ICU only for multibyte
charsets?  If I use LATIN1, I still expect upper('?') => SS, and I don't
get it...  Same for the Turkish example.


We assume the native toupper() can handle single-byte character
encodings.  We use towupper() only for wide character sets.

True, problem is that native toupper/towupper run one char at the time. This is a bad design decision in POSIX, there is no way it can handle the examples above unless considering more than one character. ICU does just that.

/Palle


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Patch for collation using ICU

Reply via email to