Re: benchmarking Flex practices

John Naylor Tue, 03 Dec 2019 03:03:46 -0800

On Tue, Nov 26, 2019 at 10:32 PM Tom Lane <t...@sss.pgh.pa.us> wrote:


> I haven't looked closely at what ecpg does with the processed
> identifiers.  If it just spits them out as-is, a possible solution
> is to not do anything about de-escaping, but pass the sequence
> U&"..." (plus UESCAPE ... if any), just like that, on to the grammar
> as the value of the IDENT token.

It does pass them along as-is, so I did it that way.

In the attached v10, I've synced both ECPG and psql.

> * I haven't convinced myself either way as to whether it'd be
> better to factor out the code duplicated between the UIDENT
> and UCONST cases in base_yylex.

I chose to factor it out, since we have 2 versions of parser.c, and
this way was much easier to work with.

Some notes:

I arranged for the ECPG grammar to only see SCONST and IDENT. With
UCONST and UIDENT out of the way, it was a small additional step to
put all string reconstruction into the lexer, which has the advantage
of allowing removal of the other special-case ECPG string tokens as
well. The fewer special cases involved in pasting the grammar
together, the better. In doing so, I've probably introduced memory
leaks, but I wanted to get your opinion on the overall approach before
investigating.

In ECPG's parser.c, I simply copied check_uescapechar() and
ecpg_isspace(), but we could find a common place if desired. During
development, I found that this file replicates the location-tracking
logic in the backend, but doesn't seem to make use of it. I also would
have had to replicate the backend's datatype for YYLTYPE. Fixing that
might be worthwhile some day, but to get this working, I just ripped
out the extra location tracking.

I no longer use state variables to track scanner state, and in fact I
removed the existing "state_before" variable in ECPG. Instead, I used
the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state().
These have been a feature for a long time, it seems, so I think we're
okay as far as portability. I think it's cleaner this way, and
possibly faster. I also used this to reunite the xcc and xcsql states.
This whole part could be split out into a separate refactoring patch
to be applied first, if desired.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v10-handle-uescapes-in-parser.patch
Description: Binary data

Re: benchmarking Flex practices

Reply via email to