On Thu, May 11, 2017 at 05:39:58PM +0000, Douglas Doole wrote: > One interesting idea from Doug Doole was to do it between the tokenizer > and > parser. I think they are glued together so you would need a way to run > the > tokenizer separately and compare that to the tokens you stored for the > cached plan. > > > When I did this, we had the same problem that the tokenizer and parser were > tightly coupled. Fortunately, I was able to do as you suggest and run the > tokenizer separately to do my analysis. > > So my model was to do statement generalization before entering the compiler at > all. I would tokenize the statement to find the literals and generate a new > statement string with placeholders. The new string would the be passed to the > compiler which would then tokenize and parse the reworked statement. > > This means we incurred the cost of tokenizing twice, but the tokenizer was > lightweight enough that it wasn't a problem. In exchange I was able to do > statement generalization without touching the compiler - the compiler saw the > generalized statement text as any other statement and handled it in the exact > same way. (There was just a bit of new code around variable binding.)
Good point. I think we need to do some measurements to see if the parser-only stage is actually significant. I have a hunch that commercial databases have much heavier parsers than we do. This split would also not work if the scanner feeds changes back into the parser. I know C does that for typedefs but I don't think we do. Ideally I would like to see percentage-of-execution numbers for typical queries for scan, parse, parse-analysis, plan, and execute to see where the wins are. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers