[HACKERS] Unicode string literals versus the world

Tom Lane Fri, 10 Apr 2009 14:54:43 -0700

So I started to look at what might be involved in teaching plpgsql about
standard_conforming_strings, and was soon dismayed by the sheer epic
nature of its failure to act like the core lexer.  It was shaky enough
before, but the recent introduction of Unicode strings and identifiers
into the core has left plpgsql hopelessly behind.


I can see two basic approaches to making things work: copy-and-paste
practically all of parser/scan.l into plpgsql's lexer (certainly all of
it that involves exclusive states); or throw out plpgsql's lexer
altogether in favor of somehow using the core lexer directly.  Neither
one looks very attractive.

It gets worse though: I have seldom seen such a badly designed piece of
syntax as the Unicode string syntax --- see
http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

You scan the string, and then after that they tell you what the escape
character is!?  Not to mention the obvious ambiguity with & as an
operator.

If we let this go into 8.4, our previous rounds with security holes
caused by careless string parsing will look like a day at the beach.
No frontend that isn't fully cognizant of the Unicode string syntax is
going to parse such things correctly --- it's going to be trivial for
a bad guy to confuse a quoting mechanism as to what's an escape and what
isn't.

I think we need to give very serious consideration to ripping out that
"feature".

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unicode string literals versus the world

Reply via email to