On Sun, Jun 23, 2013 at 10:50 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Fetter <da...@fetter.org> writes: >> On Sun, Jun 23, 2013 at 07:44:26AM -0700, Kevin Grittner wrote: >>> I think it is OK if that gets a syntax error. If that's the "worst >>> case" I like this approach. > >> I think reducing the usefulness of error messages is something we need >> to think extremely hard about before we do. Is there maybe a way to >> keep the error messages even if by some magic we manage to unreserve >> the words? > > Of the alternatives discussed so far, I don't really like anything > better than adding another special case to base_yylex(). Robert opined > in the other thread about RESPECT NULLS that the potential failure cases > with that approach are harder to reason about, which is true, but that > doesn't mean that we should accept failures we know of because there > might be failures we don't know of.
Sure, that's true; but the proposal on the other thread is just to disallow invalid syntax early enough that it benefits the parser. The error message is different, but I don't think it's a BAD error message. > One thing that struck me while thinking about this is that it seems > like we've been going about it wrong in base_yylex() in any case. > For example, because we fold WITH followed by TIME into a special token > WITH_TIME, we will fail on a statement beginning > > WITH time AS ... > > even though "time" is a legal ColId. But suppose that instead of > merging the two tokens into one, we just changed the first token into > something special; that is, base_yylex would return a special token > WITH_FOLLOWED_BY_TIME and then TIME. We could then fix the above > problem by allowing either WITH or WITH_FOLLOWED_BY_TIME as the leading > keyword of a statement; and similarly for the few other places where > WITH can be followed by an arbitrary identifier. > > Going on the same principle, we could probably let FILTER be an > unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a > type_func_name_keyword. (I've not tried this though.) I think this whole direction is going to collapse under its own weight VERY quickly. The problems you're describing are essentially shift/reduce conflicts that are invisible because they're hidden behind lexer magic. Part of the value of using a parser generator is that it TELLS you when you've added ambiguous syntax. But it doesn't know about lexer hacks, so stuff will just silently break. I think this type of lexer hacks works reasonably well keyword-like things that are used in just one place in the grammar. As soon as you get up to two, the wheels come off - as with RESPECT NULLS vs. NULLS FIRST. > This idea doesn't help much for OVER because one of the alternatives for > over_clause is "OVER ColId", and I doubt we want to have base_yylex know > all the alternatives for ColId. I also had no great success with the > NULLS FIRST/LAST case: AFAICT the substitute token for NULLS still has > to be fully reserved, meaning that something like "select nulls last" > still doesn't work without quoting. We could maybe fix that with enough > denormalization of the index_elem productions, but it'd be ugly. I don't think that particular example is very compelling - there's a general rule that column aliases can't be keywords of any type. That's not wonderful, and EnterpriseDB has had bug reports filed about it, but the real-world impact is pretty minimal, certainly compared to what we used to do which is not allow column aliases AT ALL. > It'd sure be interesting to know what the SQL committee's target parsing > algorithm is. I find it hard to believe they're uninformed enough to > not know that these random syntaxes they keep inventing are hard to deal > with in LALR(1). Or maybe they really don't give a damn about breaking > applications every time they invent a new reserved word? Does the SQL committee contemplate that SELECT * FROM somefunc() filter (id, val) should act as a table alias and that SELECT * FROM somefunc() filter (where x > 1) is an aggregate filter? This all gets much easier to understand if one of those constructs isn't allowed in that particular context. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers