Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser

Robert Haas Sat, 24 Apr 2010 17:49:40 -0700

On Fri, Apr 2, 2010 at 3:53 PM, Michael Tharp <g...@partiallystapled.com> wrote:
> Most Esteemed Hackers:
>
> Due to popular demand on #postgresql (by which I mean David Fetter), I have
> been spending a little time making the internal SQL parser available to
> clients via a C-language SQL function. The function itself is extremely
> simple: just a wrapper around a call to raw_parser followed by nodeToString.


Seems reasonable.

> Most of the "hard stuff" has been in parsing the output of nodeToString on
> the client side. So, I have a few questions to help gauge interest in
> related patches:
>
> Is there interest in a patch to extend nodes/outfuncs.c with support for
> serializing more node types? Coverage has been pretty good so far but
> various utility statements and their related nodes are missing, e.g.
> AlterTableStmt and GrantStmt. I expect that this will be the least
> contentious suggestion.

This wouldn't bother me provided the code footprint is small.  I would
be against adding a lot of complexity for this.

> The nodeToString format as it stands is somewhat ambiguous with respect to
> the type of a node member's value if one does not have access to
> readfuncs.c. For example, a T_BitString called foo is serialized as ':foo
> b1010' while a char * containing 'b1010' is also serialized as ':foo b1010'.
> This may just mean that _outToken needs to escape the leading 'b'. A similar
> problem exists for booleans ('true' as a string vs. as a boolean).

I am not inclined to change this.  Turning the format into something
self-describing seems to me to be significant work and a significant
compatibility break for a very small amount of benefit.

> Additionally, values may span more than one token for certain types e.g.
> Datum (":constvalue 4 [ 16 0 0 0 ]"). Plan trees have a few types that don't
> have a corresponding read function and output an array of space-separated
> integers. PlanInvalItem even seems to use a format containing parentheses,
> which the tokenizer splits as if it were a list. While most of these only
> occur in plan nodes and thus don't affect my use case (Datum being the
> exception), it would be ideal if they could be parsed more
> straightforwardly.

I'm not inclined to change this, either.

> These last two problems perhaps can be worked around by escaping more things
> in _outToken, but maybe it would be smarter to make the fields
> self-descriptive in terms of type. For example, the field names could be
> prefixed with a short string describing its type, which in most cases would
> be a single character, e.g. 's:schemaname' for a char*, 'b:true' for a bool,
> 'n:...' for any node (including Value nodes), or longer strings for less
> commonly used types like the integer arrays in plan nodes (although these
> would probably be better as a real integer list). These could be used to
> unambiguously parse individual tokens and also to determine how many or what
> kind of token to expect for multi-token values such as Datum which would
> otherwise require guessing. Does this seem reasonable? Is there another
> format that might make more sense?

This seems ugly to me and I don't see the utility of it.

> As far as I can tell, the current parser in nodes/read.c ignores the field
> names entirely, so this can be done without changing postgres' own parsing
> code at all and without affecting backwards compatibility of any stored
> trees. Does anyone else out there use nodeToString() output in their own
> tools, and if so, does this make your life easier or harder?
>
> Lastly, I'll leave a link to my WIP implementation in case anyone is
> interested:
>  http://bitbucket.org/gxti/parse_sql/src/
> Currently I'm working on adding support for cooked parse trees and figuring
> out what, if anything, I need to do to support multibyte encodings. My
> personal use is for parsing DDL so the input is decidedly not hostile but
> I'd still like to make this a generally useful module.
>
> Thanks in advance for any comments, tips, or flames sent my way.

Thanks for having a thick skin.  :-)

I'm having a hard time imaging what you could use this for without
encoding a lot of information about the meaning of particular
constructs.  In which case the self-describing stuff is not needed.
As you point out downthread, if all you want to do is compare, it's
not needed either.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser

Reply via email to