On Fri, 21 Mar 2025 10:21:13 +0100 Jakub Jelinek <ja...@redhat.com> wrote:
> On Wed, Mar 19, 2025 at 06:03:24PM -0400, James K. Lowden wrote: > > Elsewhere in the parser where there was a conflict like that, I > > renamed the token. For example, the COBOL word TRUE uses a token > > named TRUE_kw. I don't mind either way; your solution has less > > impact on the parser. > > I think consistency is good and when it is a suffix rather than > prefix, it also sorts alphabetically together with the actual > keywords. Thank you for that. > Ok for trunk? One comment below. It's OK as stands, technically. The icing on the cake could be done now or later, depending on what we want the history to look like. > > 2025-03-21 Jakub Jelinek <ja...@redhat.com> > > * parse.y: Rename COB_BLOCK to BLOCK_kw, COB_SIGNED to > SIGNED_kw and COB_UNSIGNED to UNSIGNED_kw. > * scan.l: Likewise. > * token_names.h: Regenerate. > > --- gcc/cobol/scan.l.jj 2025-03-21 10:09:38.903966677 +0100 > +++ gcc/cobol/scan.l 2025-03-21 10:12:07.079914108 +0100 > @@ -374,7 +374,7 @@ ROUNDING { return > ROUNDING; } SECONDS { return SECONDS; } > SECURE { return SECURE; } > SHORT { return SHORT; } > -SIGNED { return COB_SIGNED; } > +SIGNED { return SIGNED_kw; } > STANDARD-BINARY { return STANDARD_BINARY; } > STANDARD-DECIMAL { return STANDARD_DECIMAL; } > STATEMENT { return STATEMENT; } > @@ -394,7 +394,7 @@ TOWARD-LESSER { return > TOWARD_LESSER; TRUNCATION { return > TRUNCATION; } UCS-4 { return UCS_4; } > UNDERLINE { return UNDERLINE; } > -UNSIGNED { return COB_UNSIGNED; } > +UNSIGNED { return UNSIGNED_kw; } > UTF-16 { return UTF_16; } > UTF-8 { return UTF_8; } > > @@ -837,7 +837,7 @@ CALL { return CALL; } > BY { return BY; } > BOTTOM { return BOTTOM; } > BEFORE { return BEFORE; } > -BLOCK { return COB_BLOCK; } > +BLOCK { return BLOCK_kw; } > BACKWARD { return BACKWARD; } > > AT { return AT; } > @@ -1042,7 +1042,7 @@ USE({SPC}FOR)? { return USE; } > AS { return AS; } > ASCENDING { return ASCENDING; } > BLANK { return BLANK; } > - BLOCK { return COB_BLOCK; } > + BLOCK { return BLOCK_kw; } > BY { return BY; } > BYTE-LENGTH { return BYTE_LENGTH; } > CHARACTER { return CHARACTER; } > @@ -2164,7 +2164,7 @@ BASIS { yy_push_state(basis); > return BA BINARY { return BINARY; } > BIT { return BIT; } > BLANK { return BLANK; } > - BLOCK { return COB_BLOCK; } > + BLOCK { return BLOCK_kw; } > BOTTOM { return BOTTOM; } > BY { return BY; } > CALL { return CALL; } > --- gcc/cobol/parse.y.jj 2025-03-21 10:09:38.902966690 +0100 > +++ gcc/cobol/parse.y 2025-03-21 10:11:12.178674614 +0100 > @@ -408,7 +408,7 @@ > > BASED BASECONVERT > BEFORE BINARY BIT BIT_OF "BIT-OF" > BIT_TO_CHAR "BIT-TO-CHAR" > - BLANK COB_BLOCK > + BLANK BLOCK_kw You want either + BLANK BLOCK_kw "BLOCK" if it works, else + BLANK BLOCK_kw "Block" I neglected mention one knock-on effect of renaming tokens. When the parser detects a syntax error, it reports the name of the incorrect token, so you'd get syntax error at BLOCK_kw or syntax error at "foo", expecting BLOCK_kw The literal following the token name where it's defined, if present, is used in the message instead. See 3.7.2 Token Kind Names in the Bison manual for details. Used this way, upper-case strings (e.g. "TRUE" and "FALSE") interfered with the interpretation of preprocessor macros of the same name. I haven't bothered to find out why, not least because I doubt there's anything I can do about it. Since it's only a message, the user will understand "Block" just as well as "BLOCK". It might even be what he typed. One other word to the wise in this area because it's unusual: cdf.y and parse.y share some tokens. If tokens are added/deleted/rearranged in parse.y, their numeric values will change. I use a script to update cdf.y from parse.h (the produced header) to synchronize those those values. The telltale sign of forgetting to do that is a weird early testing failure, by the first test to use the CDF. In this case there was no problem because you were careful to keep the tokens in the same order. :-) --jkl