On Fri, Jan 20, 2012 at 10:27 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> The code I've written so far does no canonicalization of the input >> value of any kind, just as we do for XML. > > Fair enough. > >> So, given that framework, what the patch does is this: if you're using >> UTF-8, then \uXXXX is accepted, provided that XXXX is something that >> equates to a legal Unicode code point. It isn't converted to the >> corresponding character: it's just validated. If you're NOT using >> UTF-8, then it allows \uXXXX for code points up through 127 (which we >> assume are the same in all encodings) and anything higher than that is >> rejected. > > This seems a bit silly. If you're going to leave the escape sequence as > ASCII, then why not just validate that it names a legal Unicode code > point and be done? There is no reason whatever that that behavior needs > to depend on the database encoding.
Mostly because that would prevent us from adding canonicalization in the future, AFAICS, and I don't want to back myself into a corner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers