Greg Stark <gsst...@mit.edu> writes: > On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Well, that's why there's been no movement on this since 2004 :-(. The >> amount of work needed for a better solution seems far out of proportion >> to the benefits.
> We could extend the existing logic to handle multi-bytes characters > though, couldn't we? It's not going to fix all the problems but at > least it'll do something sane. Not easily, cheaply, or portably. The closest you could get in that line would be to use towlower(), which doesn't exist everywhere (though I grant probably most platforms have it by now). The much much bigger problem though is that we don't know what character representation towlower() deals in. We recently kluged the regex code to assume that the wchar_t representation for UTF8 locales is the standardized Unicode code point. I haven't heard of that breaking, but 9.0 hasn't been out that long. In other multibyte encodings we have no idea how to use that function, short of invoking mbstowcs/wcstombs or local equivalent, which is expensive and doesn't readily allow a short-circuit for ASCII. And, after you've hacked your way through all that, you still end up with case-folding behavior that depends on the prevailing locale. Which is dangerous for the previously cited reasons, and arguably not spec-compliant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers