Added to TODO: Allow the parser output to be seen by clients This is useful for external tools. * http://archives.postgresql.org/pgsql-hackers/2010-04/msg00095.php
--------------------------------------------------------------------------- Robert Haas wrote: > On Fri, Apr 2, 2010 at 3:53 PM, Michael Tharp <g...@partiallystapled.com> > wrote: > > Most Esteemed Hackers: > > > > Due to popular demand on #postgresql (by which I mean David Fetter), I have > > been spending a little time making the internal SQL parser available to > > clients via a C-language SQL function. The function itself is extremely > > simple: just a wrapper around a call to raw_parser followed by nodeToString. > > Seems reasonable. > > > Most of the "hard stuff" has been in parsing the output of nodeToString on > > the client side. So, I have a few questions to help gauge interest in > > related patches: > > > > Is there interest in a patch to extend nodes/outfuncs.c with support for > > serializing more node types? Coverage has been pretty good so far but > > various utility statements and their related nodes are missing, e.g. > > AlterTableStmt and GrantStmt. I expect that this will be the least > > contentious suggestion. > > This wouldn't bother me provided the code footprint is small. I would > be against adding a lot of complexity for this. > > > The nodeToString format as it stands is somewhat ambiguous with respect to > > the type of a node member's value if one does not have access to > > readfuncs.c. For example, a T_BitString called foo is serialized as ':foo > > b1010' while a char * containing 'b1010' is also serialized as ':foo b1010'. > > This may just mean that _outToken needs to escape the leading 'b'. A similar > > problem exists for booleans ('true' as a string vs. as a boolean). > > I am not inclined to change this. Turning the format into something > self-describing seems to me to be significant work and a significant > compatibility break for a very small amount of benefit. > > > Additionally, values may span more than one token for certain types e.g. > > Datum (":constvalue 4 [ 16 0 0 0 ]"). Plan trees have a few types that don't > > have a corresponding read function and output an array of space-separated > > integers. PlanInvalItem even seems to use a format containing parentheses, > > which the tokenizer splits as if it were a list. While most of these only > > occur in plan nodes and thus don't affect my use case (Datum being the > > exception), it would be ideal if they could be parsed more > > straightforwardly. > > I'm not inclined to change this, either. > > > These last two problems perhaps can be worked around by escaping more things > > in _outToken, but maybe it would be smarter to make the fields > > self-descriptive in terms of type. For example, the field names could be > > prefixed with a short string describing its type, which in most cases would > > be a single character, e.g. 's:schemaname' for a char*, 'b:true' for a bool, > > 'n:...' for any node (including Value nodes), or longer strings for less > > commonly used types like the integer arrays in plan nodes (although these > > would probably be better as a real integer list). These could be used to > > unambiguously parse individual tokens and also to determine how many or what > > kind of token to expect for multi-token values such as Datum which would > > otherwise require guessing. Does this seem reasonable? Is there another > > format that might make more sense? > > This seems ugly to me and I don't see the utility of it. > > > As far as I can tell, the current parser in nodes/read.c ignores the field > > names entirely, so this can be done without changing postgres' own parsing > > code at all and without affecting backwards compatibility of any stored > > trees. Does anyone else out there use nodeToString() output in their own > > tools, and if so, does this make your life easier or harder? > > > > Lastly, I'll leave a link to my WIP implementation in case anyone is > > interested: > > ?http://bitbucket.org/gxti/parse_sql/src/ > > Currently I'm working on adding support for cooked parse trees and figuring > > out what, if anything, I need to do to support multibyte encodings. My > > personal use is for parsing DDL so the input is decidedly not hostile but > > I'd still like to make this a generally useful module. > > > > Thanks in advance for any comments, tips, or flames sent my way. > > Thanks for having a thick skin. :-) > > I'm having a hard time imaging what you could use this for without > encoding a lot of information about the meaning of particular > constructs. In which case the self-describing stuff is not needed. > As you point out downthread, if all you want to do is compare, it's > not needed either. > > ...Robert > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers