Re: [HACKERS] Unicode Normalization

2009-09-24 Thread pg
In a context using normalization, wouldn't you typically want to store a normalized-text type that could perhaps (depending on locale) take advantage of simpler, more-efficient comparison functions? Whether you're doing INSERT/UPDATE, or importing a flat text file, if you canonicalize characters

Re: [HACKERS] Unicode Normalization

2009-09-24 Thread David E. Wheeler
On Sep 24, 2009, at 8:59 AM, Andrew Dunstan wrote: That might be nice, but I'd be wary of a geometric multiplication of text types. We already have TEXT and CITEXT; what if we had your NTEXT (normalized text) but I wanted it to also be case-insensitive? Actually, I don't think it's necessar

Re: [HACKERS] Unicode Normalization

2009-09-24 Thread Andrew Dunstan
David E. Wheeler wrote: On Sep 24, 2009, at 6:24 AM, p...@thetdh.com wrote: In a context using normalization, wouldn't you typically want to store a normalized-text type that could perhaps (depending on locale) take advantage of simpler, more-efficient comparison functions? That might be n

Re: [HACKERS] Unicode Normalization

2009-09-24 Thread David E. Wheeler
On Sep 24, 2009, at 6:24 AM, p...@thetdh.com wrote: In a context using normalization, wouldn't you typically want to store a normalized-text type that could perhaps (depending on locale) take advantage of simpler, more-efficient comparison functions? That might be nice, but I'd be wary of

Re: [HACKERS] Unicode Normalization

2009-09-23 Thread David E. Wheeler
On Sep 23, 2009, at 11:08 AM, David E. Wheeler wrote: I looked around and found the Public Software Group's utf8proc project, which even includes some PostgreSQL support (not, alas, for normalization). It has an MIT-licensed C library that offers these functions: Sorry, forgot the link:

Re: [HACKERS] Unicode Normalization

2009-09-23 Thread David E. Wheeler
On Sep 23, 2009, at 11:08 AM, David E. Wheeler wrote: I just had a discussion on IRC about unicode normalization in PostgreSQL. Apparently there is not support for it, currently. BTW, the only reference I found on the [to do list](http://wiki.postgresql.org/wiki/Todo ) was: More sensible