Andrew Dunstan <and...@dunslane.net> writes: > On 06/09/2013 12:38 AM, Noah Misch wrote: >> PostgreSQL has lived with this wrong behavior since ... the beginning? It's >> a >> problem, certainly, but a bandage fix brings its own trouble.
I don't see this as particularly bandage-y. It's a subset of the spec-required folding behavior, sure, but at least now it's a proper subset of that behavior. It preserves all cases in which the previous coding did the right thing, while removing some cases in which it didn't. > If you have a better fix I am all ears. I can recall at least one > discussion of this area (concerning Turkish I quite a few years ago) > where we failed to come up with anything. Yeah, Turkish handling of i/I messes up any attempt to use the locale's case-folding rules straightforwardly. However, I think we've already fixed that with the rule that ASCII characters are folded manually. The resistance to moving this code to use towlower() for non-ASCII mainly comes from worries about speed, I think; although there was also something about downcasing conversions that change the string's byte length being problematic for some callers. > I have a fairly hard time believing in your "relies on this and somehow > works" scenario. The key point for me is that if tolower() actually does anything in the previous state of the code, it's more than likely going to produce invalidly encoded data. The consequences of that can't be good. You can argue that there might be people out there for whom the transformation accidentally produced a validly-encoded string, but how likely is that really? It seems much more likely that the only reason we've not had more complaints is that on most popular platforms, the code accidentally fails to fire on any UTF8 characters (or any common ones, anyway). On those platforms, there will be no change of behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers