Re: [PATCH] cobol: Rename COB_{BLOCK,UNSIGNED,SIGNED} to {BLOCK,UNSIGNED,SIGNED}_kw for consistency

James K. Lowden Fri, 21 Mar 2025 10:49:55 -0700

On Fri, 21 Mar 2025 10:21:13 +0100
Jakub Jelinek <ja...@redhat.com> wrote:


> On Wed, Mar 19, 2025 at 06:03:24PM -0400, James K. Lowden wrote:
> > Elsewhere in the parser where there was a conflict like that, I
> > renamed the token.  For example, the COBOL word TRUE uses a token
> > named TRUE_kw.  I don't mind either way; your solution has less
> > impact on the parser.  
> 
> I think consistency is good and when it is a suffix rather than
> prefix, it also sorts alphabetically together with the actual
> keywords.

Thank you for that.  

> Ok for trunk?

One comment below.  It's OK as stands, technically.  The icing on the
cake could be done now or later, depending on what we want the history
to look like.  

> 
> 2025-03-21  Jakub Jelinek  <ja...@redhat.com>
> 
>       * parse.y: Rename COB_BLOCK to BLOCK_kw, COB_SIGNED to
> SIGNED_kw and COB_UNSIGNED to UNSIGNED_kw.
>       * scan.l: Likewise.
>       * token_names.h: Regenerate.
> 
> --- gcc/cobol/scan.l.jj       2025-03-21 10:09:38.903966677 +0100
> +++ gcc/cobol/scan.l  2025-03-21 10:12:07.079914108 +0100
> @@ -374,7 +374,7 @@ ROUNDING                  { return
> ROUNDING; } SECONDS                           { return SECONDS; }
>  SECURE                               { return SECURE; }
>  SHORT                                { return SHORT; }
> -SIGNED                               { return COB_SIGNED; }
> +SIGNED                               { return SIGNED_kw; }
>  STANDARD-BINARY                      { return STANDARD_BINARY; }
>  STANDARD-DECIMAL             { return STANDARD_DECIMAL; }
>  STATEMENT                    { return STATEMENT; }
> @@ -394,7 +394,7 @@ TOWARD-LESSER                     { return
> TOWARD_LESSER; TRUNCATION                     { return
> TRUNCATION; } UCS-4                           { return UCS_4; }
>  UNDERLINE                    { return UNDERLINE; }
> -UNSIGNED                     { return COB_UNSIGNED; }
> +UNSIGNED                     { return UNSIGNED_kw; }
>  UTF-16                               { return UTF_16; }
>  UTF-8                                { return UTF_8; }
>  
> @@ -837,7 +837,7 @@ CALL              { return CALL; }
>  BY           { return BY; }
>  BOTTOM               { return BOTTOM; }
>  BEFORE               { return BEFORE; }
> -BLOCK                { return COB_BLOCK; }
> +BLOCK                { return BLOCK_kw; }
>  BACKWARD     { return BACKWARD; }
>  
>  AT           { return AT; }
> @@ -1042,7 +1042,7 @@ USE({SPC}FOR)?          { return USE; }
>    AS                 { return AS; }
>    ASCENDING          { return ASCENDING; }
>    BLANK              { return BLANK; }
> -  BLOCK                      { return COB_BLOCK; }
> +  BLOCK                      { return BLOCK_kw; }
>    BY                 { return BY; }
>    BYTE-LENGTH                { return BYTE_LENGTH; }
>    CHARACTER          { return CHARACTER; }
> @@ -2164,7 +2164,7 @@ BASIS           { yy_push_state(basis);
> return BA BINARY      { return BINARY; }
>    BIT        { return BIT; }
>    BLANK      { return BLANK; }
> -  BLOCK      { return COB_BLOCK; }
> +  BLOCK      { return BLOCK_kw; }
>    BOTTOM     { return BOTTOM; }
>    BY { return BY; }
>    CALL       { return CALL; }
> --- gcc/cobol/parse.y.jj      2025-03-21 10:09:38.902966690 +0100
> +++ gcc/cobol/parse.y 2025-03-21 10:11:12.178674614 +0100
> @@ -408,7 +408,7 @@
>  
>                       BASED BASECONVERT
>                       BEFORE BINARY BIT BIT_OF "BIT-OF"
> BIT_TO_CHAR "BIT-TO-CHAR"
> -                     BLANK COB_BLOCK
> +                     BLANK BLOCK_kw

You want either

        +                       BLANK BLOCK_kw "BLOCK" 
if it works, else
        +                       BLANK BLOCK_kw "Block"

I neglected mention one knock-on effect of renaming tokens.  When the
parser detects a syntax error, it reports the name of the incorrect
token, so you'd get 

        syntax error at BLOCK_kw
or
        syntax error at "foo", expecting BLOCK_kw

The literal following the token name where it's defined, if present, is
used in the message instead.  See 3.7.2 Token Kind Names in the Bison
manual for details.  

Used this way, upper-case strings (e.g. "TRUE" and "FALSE") interfered
with the interpretation of preprocessor macros of the same name.  I
haven't bothered to find out why, not least because I doubt there's
anything I can do about it. Since it's only a message, the user will
understand "Block" just as well as "BLOCK".  It might even be what he
typed.  

One other word to the wise in this area because it's unusual: cdf.y and
parse.y share some tokens.  If tokens are added/deleted/rearranged in
parse.y, their numeric values will change.  I use a script to update
cdf.y from parse.h (the produced header) to synchronize those those
values.  The telltale sign of forgetting to do that is a weird early
testing failure, by the first test to use the CDF.  

In this case there was no problem because you were careful to keep the
tokens in the same order.  :-)  

--jkl

Re: [PATCH] cobol: Rename COB_{BLOCK,UNSIGNED,SIGNED} to {BLOCK,UNSIGNED,SIGNED}_kw for consistency

Reply via email to