On Sun, Jun 09, 2013 at 11:39:18AM -0400, Tom Lane wrote: > The key point for me is that if tolower() actually does anything in the > previous state of the code, it's more than likely going to produce > invalidly encoded data. The consequences of that can't be good. > You can argue that there might be people out there for whom the > transformation accidentally produced a validly-encoded string, but how > likely is that really? It seems much more likely that the only reason > we've not had more complaints is that on most popular platforms, the > code accidentally fails to fire on any UTF8 characters (or any common > ones, anyway). On those platforms, there will be no change of behavior.
Your hypothesis is that almost all libc tolower() implementations will in every case either (a) turn a multi-byte character to byte soup not valid in the server encoding or (b) leave it unchanged? Quite possible. If that hypothesis holds, I agree that the committed change does not break compatibility. That carries a certain appeal. I still anticipate regretting that we have approved and made reliable this often-sufficed-by-accident behavior, particularly when the SQL standard calls for something else. But I think I now understand your reasoning. > The resistance to moving this code to use towlower() for non-ASCII > mainly comes from worries about speed, I think; although there was also > something about downcasing conversions that change the string's byte > length being problematic for some callers. Considering that using ASCII-only or quoted identifiers sidesteps the speed penalty altogether, that seems a poor cause for demur. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers