On Tue, Nov 26, 2019 at 10:32 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> I haven't looked closely at what ecpg does with the processed > identifiers. If it just spits them out as-is, a possible solution > is to not do anything about de-escaping, but pass the sequence > U&"..." (plus UESCAPE ... if any), just like that, on to the grammar > as the value of the IDENT token. It does pass them along as-is, so I did it that way. In the attached v10, I've synced both ECPG and psql. > * I haven't convinced myself either way as to whether it'd be > better to factor out the code duplicated between the UIDENT > and UCONST cases in base_yylex. I chose to factor it out, since we have 2 versions of parser.c, and this way was much easier to work with. Some notes: I arranged for the ECPG grammar to only see SCONST and IDENT. With UCONST and UIDENT out of the way, it was a small additional step to put all string reconstruction into the lexer, which has the advantage of allowing removal of the other special-case ECPG string tokens as well. The fewer special cases involved in pasting the grammar together, the better. In doing so, I've probably introduced memory leaks, but I wanted to get your opinion on the overall approach before investigating. In ECPG's parser.c, I simply copied check_uescapechar() and ecpg_isspace(), but we could find a common place if desired. During development, I found that this file replicates the location-tracking logic in the backend, but doesn't seem to make use of it. I also would have had to replicate the backend's datatype for YYLTYPE. Fixing that might be worthwhile some day, but to get this working, I just ripped out the extra location tracking. I no longer use state variables to track scanner state, and in fact I removed the existing "state_before" variable in ECPG. Instead, I used the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state(). These have been a feature for a long time, it seems, so I think we're okay as far as portability. I think it's cleaner this way, and possibly faster. I also used this to reunite the xcc and xcsql states. This whole part could be split out into a separate refactoring patch to be applied first, if desired. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
v10-handle-uescapes-in-parser.patch
Description: Binary data