-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 25/04/2010 03:02, Tom Lane wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Sat, Apr 24, 2010 at 8:07 PM, Bruce Momjian <br...@momjian.us> wrote: >>> Sounds useful to me, though as a function like suggested in a later >>> email. > >> If tool-builders think this is useful, I have no problem with making >> it available. It should be suitably disclaimed: "We reserve the right >> to rip out the entire flex/yacc-based lexer and parser at any time and >> replace them with a hand-coded system written in Prolog that emits >> tokenization information only in ASN.1-encoded pig latin. If massive >> changes in the way this function works - or its complete disappearance >> - are going to make you grumpy, don't call it." > > I'm a bit concerned with the vagueness of the goals here. We started > with a request to dump out node trees, ie, post-parsing representation; > but the example use case of syntax highlighting would find that > representation quite useless. (Example: foo::bar and CAST(foo AS bar) > yield the same parse tree.)
Well, the tokenizer stuff was actually my understanding of the following quote from Michael Tharp : « ... making the internal SQL parser available to clients via a C-language SQL function. ». I thought Michael was trying to write a tokenizer based on node tree returned by raw_parser. As it seems Michael is not even sure about what he's trying to do, I prefer refocus a bit this thread > A syntax highlighter might get some use > out of the lexer-output token stream, but I'm afraid from the proposed > output that people might be expecting more semantic information than > the lexer can provide. The lexer doesn't, for example, have any clue > that some keywords are commands and others aren't; nor any very clear > understanding about the semantic difference between the tokens '=' > and ';'. Exact, a proper tokenizer function should be able to give some (simple) information about the type of each token. That is what I tried to define in this draft with the "type" field : => SELECT pgtokenize($script$ SELECT 1; UPDATE test SET "a"=2; $script$); type | pos | value | line -------------+-----+----------+------ SQL_COMMAND | 1 | 'SELECT' | 1 CONSTANT | 8 | '1' | 1 DELIMITER | 9 | ';' | 1 SQL_COMMAND | 11 | 'UPDATE' | 2 IDENTIFIER | 18 | 'test' | 2 SQL_KEYWORD | 23 | 'SET' | 2 IDENTIFIER | 27 | '"a"' | 2 OPERATOR | 30 | '=' | 2 CONSTANT | 31 | '1' | 2 > > Also, if all you want is the lexer, it's not that hard to steal psql's > version and adapt it to your purposes. The lexer doesn't change very > fast, and it's not that big either. Stealing the lexer from psql is possible...for C application. Don't know yet if we could port it to other languages easily and if a simple lexer would really answer the use cases here. > > Anyway, it certainly wouldn't be hard for an add-on module to provide a > SRF that calls the lexer (or parser) and returns some sort of tabular > representation of the results. I'm just not sure how useful it'll be > in the real world. Well, I would prefer not to tell users of pgAdmin or phpPgAdmin that they depend on a contrib module. Moreover, PostgreSQL already expose a lot of informations about its internal mechanisms, configuration, ddl etc. I think having a proper tokenizer function is just a natural new functionality for core if possible. Having dropped an eye here and there in the parser code, I am not sure where I could get required info and mix them to produce something close to my draft yet. But I prefer to discussing first before spending too much time and throwing any potential code after... > > regards, tom lane - -- JGuillaume (ioguix) de Rorthais http://www.dalibo.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvXdxgACgkQxWGfaAgowiJujQCglXpCYpFttwHOkmkCd92zMxnv r00An1sjmRrR6u61VjCtXputcNBevHsz =ri3i -----END PGP SIGNATURE----- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers