So I started to look at what might be involved in teaching plpgsql about standard_conforming_strings, and was soon dismayed by the sheer epic nature of its failure to act like the core lexer. It was shaky enough before, but the recent introduction of Unicode strings and identifiers into the core has left plpgsql hopelessly behind.
I can see two basic approaches to making things work: copy-and-paste practically all of parser/scan.l into plpgsql's lexer (certainly all of it that involves exclusive states); or throw out plpgsql's lexer altogether in favor of somehow using the core lexer directly. Neither one looks very attractive. It gets worse though: I have seldom seen such a badly designed piece of syntax as the Unicode string syntax --- see http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE You scan the string, and then after that they tell you what the escape character is!? Not to mention the obvious ambiguity with & as an operator. If we let this go into 8.4, our previous rounds with security holes caused by careless string parsing will look like a day at the beach. No frontend that isn't fully cognizant of the Unicode string syntax is going to parse such things correctly --- it's going to be trivial for a bad guy to confuse a quoting mechanism as to what's an escape and what isn't. I think we need to give very serious consideration to ripping out that "feature". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers