On Fri, Mar 22, 2019 at 5:38 AM John Naylor <john.nay...@2ndquadrant.com> wrote: > On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov > <a.korot...@postgrespro.ru> wrote: > > Attaches patches improving jsonpath parser. First one introduces > > cosmetic changes, while second gets rid of backtracking. I'm also > > planning to add high-level comment for both grammar and lexer. > > The cosmetic changes look good to me. I just noticed a couple things > about the comments. > > 0001: > > +/* Check if current scanstring value constitutes a keyword */ > > 'is a keyword' is better. 'Constitutes' implies parts of a whole. > > + * Resize scanstring for appending of given length. Reinitilize if required. > > s/Reinitilize/Reinitialize/ > > The first sentence is not entirely clear to me.
Thank you, fixed. > 0002: > > These two rules are not strictly necessary: > > <xnq,xq,xvq,xsq>{unicode}+\\ { > /* throw back the \\, and treat as unicode */ > yyless(yyleng - 1); > parseUnicode(yytext, yyleng); > } > > <xnq,xq,xvq,xsq>{hex_char}+\\ { > /* throw back the \\, and treat as hex */ > yyless(yyleng - 1); > parseHexChars(yytext, yyleng); > } > > ...and only seem to be there because of how these are written: > > <xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); } > <xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); } > <xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode > sequence is invalid"); } > <xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character > sequence is invalid"); } > > I don't understand the reasoning here -- is it a micro-optimization? > The following is simpler, allow the rules I mentioned to be removed, > and make check still passes. I would prefer it unless there is a > performance penalty, in which case a comment to describe the > additional complexity would be helpful. > > <xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); } > <xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); } > <xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); > } > <xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is > invalid"); } These rules are needed for unicode. Sequential escaped unicode characters might be connected by hi surrogate value. See jsonpath_encoding regression test in attached patch. Regarding hex, I made it so for the sake of uniformity. But I changed my mind and decided that simpler flex rules are more important. So, now they are considered one-by-one. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
0001-cosmetic-changes-jsonpath-parser-2.patch
Description: Binary data
0002-get-rid-of-backtracking-2.patch
Description: Binary data