Re: [HACKERS] multibyte-character aware support for function "downcase_truncate_identifier()"

Tom Lane Tue, 23 Nov 2010 09:13:20 -0800

Greg Stark <gsst...@mit.edu> writes:
> On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Well, that's why there's been no movement on this since 2004 :-(.  The
>> amount of work needed for a better solution seems far out of proportion
>> to the benefits.


> We could extend the existing logic to handle multi-bytes characters
> though, couldn't we? It's not going to fix all the problems but at
> least it'll do something sane.

Not easily, cheaply, or portably.  The closest you could get in that
line would be to use towlower(), which doesn't exist everywhere
(though I grant probably most platforms have it by now).  The much much
bigger problem though is that we don't know what character representation
towlower() deals in.  We recently kluged the regex code to assume that
the wchar_t representation for UTF8 locales is the standardized Unicode
code point.  I haven't heard of that breaking, but 9.0 hasn't been out
that long.  In other multibyte encodings we have no idea how to use that
function, short of invoking mbstowcs/wcstombs or local equivalent, which
is expensive and doesn't readily allow a short-circuit for ASCII.

And, after you've hacked your way through all that, you still end up
with case-folding behavior that depends on the prevailing locale.
Which is dangerous for the previously cited reasons, and arguably not
spec-compliant.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte-character aware support for function "downcase_truncate_identifier()"

Reply via email to