On Fri, Apr 4, 2025 at 11:50 PM Robert Dubner <rdub...@symas.com> wrote:
>
> Anybody who might have gotten interested should stand down.
>
> As usual, that analysis got me thinking.
>
> I got focused on where var_decl_return_code was being used.  (I was wrong.
> I made the mistake because I had just eliminated two sets of errors caused
> by the optimization actually optimizing away things I need, so I had that
> in the front of my brain.)  Richard told me of something odd at the point
> where var_decl_return was being established.  I finally decided to look at
> that.
>
> Turned out, ultimately, to be a SHORT / USHORT mismatch on the variables
> being given to a MODIFY_EXPR.  Apparently the optimization algorithms can
> be extremely cranky about value types.

Yes.  Unless you implement the GET_ALIAS_SET language hook even more
so than when using C.  But I expect COBOL to be a quite restrictive language
with respect to typing - is there something like pointers so a COBOL program
can access the bit-representation of some variable as a different type?

Richard.

>
> In any event, with that straightened out, everything is working without
> the flag_strict_aliasing modification.
>
> Thanks for asking, and thanks for listening.
>
> > -----Original Message-----
> > From: Robert Dubner <rdub...@symas.com>
> > Sent: Friday, April 4, 2025 16:02
> > To: Sam James <s...@gentoo.org>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: RE: [committed] cobol: Eliminate cobolworx UAT errors when
> > compiling with -Os
> >
> > > -----Original Message-----
> > > From: Sam James <mailto:s...@gentoo.org>
> > > Sent: Friday, April 4, 2025 14:28
> > > To: Robert Dubner <mailto:rdub...@symas.com>
> > > Cc: 'GCC Patches' <mailto:gcc-patches@gcc.gnu.org>
> > > Subject: Re: [committed] cobol: Eliminate cobolworx UAT errors when
> > > compiling with -Os
> > >
> > > Robert Dubner <mailto:rdub...@symas.com> writes:
> > >
> > > > From e70fe5ed46ab129a8b1da961c47d3fb75b11b988 Mon Sep 17 00:00:00
> 2001
> > > > From: Bob Dubner mailto:rdub...@symas.com
> > > > Date: Fri, 4 Apr 2025 13:48:58 -0400
> > > > Subject: [PATCH] cobol: Eliminate cobolworx UAT errors when
> compiling
> > > with
> > > > -Os
> > > >
> > > > Testcases compiled with -Os were failing because static functions
> and
> > > > static
> > > > variables were being optimized away, because of improper data type
> > > casts,
> > > > and
> > > > because strict aliasing (whatever that is) was resulting in some
> loss
> > > > of
> > >
> > > Are you unfamiliar with that from C and C++? See
> > > https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 (we
> all
> > > have our favourite documents to explain it) but I don't know if COBOL
> is
> > > amenable to the concept of TBAA.
> > >
> > > > data.
> > > > These changes eliminate those known problems.
> > >
> > > I'd suggest that this should be accompanied by some question,
> otherwise
> > > it's going to live there forever and it's not necessarily right
> (though
> > > see above - if COBOL is incompatible with the idea, it might need
> > > something along those lines, though not sure this is the right way of
> > > doing that).
> > >
> > > That is, while you're free to approve your own COBOL patches, you're
> > > also free to CC others and ask them for advice even if not explicit
> > > approval before pushing them if something doesn't seem to be correct
> or
> > > is a hack.
> >
> > I am not any kind of compiler expert, except, possibly in one way: I
> don't
> > trust them.  I have never trusted them.  I don't trust them, and I don't
> > trust the computers they run on.  I regard compilers the same way a
> > lion-tamer regards the big cats they work with every day.  I treat them
> > with firm respect; I expect them to behave the way they've been trained
> --
> > and I never turn my back on them.  So, no, I don't understand TBAA; I
> had
> > to look it up.  I don't understand aliasing.
> >
> > The problem at hand is not a COBOL problem.  It is a Bob problem.
> Richard
> > pointed me at the original root of the thing.  I believe I have
> addressed
> > that.  But the problem has not gone away.  The "flag_strict_aliasing =
> 0;"
> > solution makes the symptoms go away.  I spent a couple of hours messing
> > with this, and I was unable to resolve it.  So I have shrugged and used
> > what I have.
> >
> > You probably have figured out by now that when I need help, it's because
> > 1) I have missed something stupid, 2) I was just plain ignorant, or 3)
> > It's a hard problem.
> >
> > And because it's a hard problem, it's hard to describe.  But you sort of
> > asked, so I will sort of try to explain what's going on.
> >
> > IBM-flavored COBOL has the concept of a RETURN-CODE, a global 16-bit
> > integer that is shared by all PROGRAM-ID modules.  (Each is implemented
> as
> > a C type function.)
> >
> > In GCOBOL, a COBOL variable is actually a structure with a data area,
> > because COBOL variables have lots of metadata associated with them,
> > including the type of storage, the amount of storage, in many cases the
> > output format of the variable, whether or not they are signed, offsets
> > from parent, number of digits, number of decimal places, and more.  To
> > make life easier, we implement RETURN-CODE as a COBOL variable, but its
> > data area is a global 16-bit "short" defined in the libgcobol.so.
> >
> > So:  Here we go:
> >
> > In libgcobol/constants.cc:
> >       short __gg__data_return_code = 0;
> >
> > In libgcobol/charmaps.h:
> > extern short         __gg__data_return_code    ;
> >
> > In gcc/cobol/genapi.cc, I need to be able to generate GENERIC that
> > accesses that variable.
> >
> > So, in gcc/genutil.h:
> >      extern tree var_decl_return_code; // short __gg__data_return_code
> >
> > In gcc/cobol/genutil.cc:
> >      tree var_decl_return_code; // short __gg__data_return_code
> >
> > In gcc/cobol/genapicc:
> >      SET_VAR_DECL(var_decl_return_code, SHORT,
> "__gg__data_return_code");
> >
> > That macro results in
> >      var_decl_return_code = gg_declare_variable(
> >           SHORT,
> >           "__gg__data_return_code",
> >           NULL_TREE, // Initial value
> >           vs_external_reference)
> >
> > SHORT is defined in gcc/cobol/gengen.h
> >      #define SHORT      short_integer_type_node
> >
> > The gg_declare_variable() function is in gengen.cc.  It creates a
> var_decl
> > for a SHORT with the name "__gg__data_return_code", and these
> attributes:
> >
> >       DECL_CONTEXT (var_decl) = gg_trans_unit.trans_unit_decl;
> >       TREE_USED(var_decl)   = 1;
> >       DECL_EXTERNAL (var_decl) = 1;
> >       TREE_PUBLIC(var_decl) = 1;
> >
> > With that all in place, in genapi.cc we turn our attention to the
> > parser_see_stop_run() routine.  That routine uses a static integer named
> > "..pssr_retval":
> >
> > static tree returned_value = gg_define_variable(INT, "..pssr_retval",
> > vs_file_static);
> >
> > That variable gets set thusly:
> >
> >     gg_assign(returned_value, gg_cast(INT, var_decl_return_code));
> >
> > (gg_assign() creates a MODIFY_EXPR, gg_cast does "return
> > fold_convert(type, var);"
> >
> > and that variable gets used like this:
> >
> >   gg_exit(returned_value);
> >
> > gg_exit is found in gengen.cc:
> >
> >      void
> >      gg_exit(tree exit_code)
> >        {
> >        tree the_call =
> >            build_call_expr_loc(location_from_lineno(),
> >                                builtin_decl_explicit (BUILT_IN_EXIT),
> >                                1,
> >                                exit_code);
> >        gg_append_statement(the_call);
> >        }
> >
> > So, we have the stage set. When the prior version of GCOBOL (without the
> > flag_strict_aliasing=0) compiles this program
> >
> >         PROGRAM-ID.      PROG.
> >         PROCEDURE        DIVISION.
> >         MOVE 1 TO RETURN-CODE
> >         STOP RUN.
> >
> > with -O0 or -O1, the executable terminates with 1.
> >
> > With -O2, -O3, or -Os, the executable terminates with 0.
> >
> > It was at this point that I gave up.  As far as I can tell I am creating
> a
> > signed short, I am picking it up, I am casting it to a signed int, and I
> > am passing that to exit().
> >
> > I don't know what else to try.
> >
> > With flag_strict_aliasing=0, it works.  Without it, it doesn't.
> >
> > I dislike needing that flag_strict_aliasing=0; it feels a bit like
> turning
> > my back to a tiger.  But here we are.
> >
> > Remember: You asked. And thank you for asking.  And if anybody can offer
> > any insight as to what's going on here, it will be received with great
> > enthusiasm.
> >
> >
> > >
> > > >
> > > > gcc/cobol
> > > >
> > > >   * cobol1.cc: (cobol_langhook_post_options): Implemented in order
> > > > to set
> > > >   flag_strict_aliasing to zero.
> > > >   * genapi.cc: (set_user_status): Add comment.
> > > >   (parser_intrinsic_subst): Expand SHOW_PARSE information.
> > > >   (psa_global): Change names of return-code and upsi globals,
> > > >   (psa_FldLiteralA): Set DECL_PRESERVE_P for FldLiteralA.
> > > >   * gengen.cc: (show_type): Add POINTER type.
> > > >   (gg_define_function_with_no_parameters): Set DECL_PRESERVE_P for
> > > > COBOL-
> > > >   style nested programs.  (gg_array_of_bytes): Fix bad cast.
> > > >
> > > > libgcobol
> > > >
> > > >   * charmaps.h: Change __gg__data_return_code to 'short' type.
> > > >   * constants.cc: Likewise.
> > > > ---
> > > >  gcc/cobol/cobol1.cc    | 19 +++++++++++++++++++
> > > >  gcc/cobol/genapi.cc    | 19 +++++++++++++++++--
> > > >  gcc/cobol/gengen.cc    | 12 ++++++++++--
> > > >  libgcobol/charmaps.h   |  2 +-
> > > >  libgcobol/constants.cc | 10 +++++-----
> > > >  5 files changed, 52 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/gcc/cobol/cobol1.cc b/gcc/cobol/cobol1.cc
> > > > index 0d07c460d41..d175ab11e3f 100644
> > > > --- a/gcc/cobol/cobol1.cc
> > > > +++ b/gcc/cobol/cobol1.cc
> > > > @@ -646,6 +646,22 @@ cobol_get_sarif_source_language(const char *)
> > > >      return "cobol";
> > > >      }
> > > >
> > > > +bool
> > > > +cobol_langhook_post_options(const char**)
> > > > +  {
> > > > +  // This flag, when set to 0, results in calls to gg_exit working
> > > > properly.
> > >
> > > Comments like this usually have a ??? or similar.
> > >
> > > > +  // I don't know why it is necessary.  There is something going on
> > > with
> > > > the
> > > > +  // definition of  __gg__data_return_code in constants.cc, and
> with
> > > how
> > > > it
> > > > +  // is used through var_decl_return_code in genapi.cc.  Without
> it,
> > > the
> > > > value
> > > > +  // delivered to exit@PLT is zero, and not __gg__data_return_code
> > > > +  // Dubner, 2025-04-04.
> > > > +  flag_strict_aliasing = 0;
> > > > +
> > > > +  /* Returning false means that the backend should be used.  */
> > > > +  return false;
> > > > +  }
> > > > +
> > > > +
> > > >  #undef LANG_HOOKS_BUILTIN_FUNCTION
> > > >  #undef LANG_HOOKS_GETDECLS
> > > >  #undef LANG_HOOKS_GLOBAL_BINDINGS_P
> > > > @@ -660,6 +676,7 @@ cobol_get_sarif_source_language(const char *)
> > > >  ////#undef LANG_HOOKS_TYPE_FOR_SIZE
> > > >  #undef LANG_HOOKS_SET_DECL_ASSEMBLER_NAME
> > > >  #undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > > > +#undef LANG_HOOKS_POST_OPTIONS
> > > >
> > > >  // We use GCC in the name, not GNU, as others do,
> > > >  // because "GnuCOBOL" refers to a different GNU project.
> > > > @@ -685,6 +702,8 @@ cobol_get_sarif_source_language(const char *)
> > > >
> > > >  #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > > > cobol_get_sarif_source_language
> > > >
> > > > +#define LANG_HOOKS_POST_OPTIONS cobol_langhook_post_options
> > > > +
> > > >  struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> > > >
> > > >  #include "gt-cobol-cobol1.h"
> > > > diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
> > > > index a0da6476e2a..fbe0bbc75dc 100644
> > > > --- a/gcc/cobol/genapi.cc
> > > > +++ b/gcc/cobol/genapi.cc
> > > > @@ -8806,6 +8806,10 @@ static
> > > >  void set_user_status(struct cbl_file_t *file)
> > > >    {
> > > >    // This routine sets the user_status, if any, to the
> > > > cblc_file_t::status
> > > > +
> > > > +  // We have to do it this way, because in the case where the
> > > > file->user_status
> > > > +  // is in linkage, the memory addresses can end up pointing to the
> > > wrong
> > > > +  // places
> > > >    if(file->user_status)
> > > >      {
> > > >      cbl_field_t *user_status =
> > > > cbl_field_of(symbol_at(file->user_status));
> > > > @@ -10111,6 +10115,13 @@ parser_intrinsic_subst( cbl_field_t *f,
> > > >    SHOW_PARSE
> > > >      {
> > > >      SHOW_PARSE_HEADER
> > > > +    SHOW_PARSE_FIELD(" TO ", f)
> > > > +    for(size_t i=0; i<argc; i++)
> > > > +      {
> > > > +      SHOW_PARSE_INDENT
> > > > +      SHOW_PARSE_FIELD(" ", argv[i].orig.field)
> > > > +      SHOW_PARSE_FIELD(" ", argv[i].replacement.field)
> > > > +      }
> > > >      SHOW_PARSE_END
> > > >      }
> > > >    TRACE1
> > > > @@ -15908,12 +15919,12 @@ psa_global(cbl_field_t *new_var)
> > > >
> > > >    if( strcmp(new_var->name, "RETURN-CODE") == 0 )
> > > >      {
> > > > -    strcpy(ach, "__gg___11_return_code6");
> > > > +    strcpy(ach, "__gg__return_code");
> > > >      }
> > > >
> > > >    if( strcmp(new_var->name, "UPSI-0") == 0 )
> > > >      {
> > > > -    strcpy(ach, "__gg___6_upsi_04");
> > > > +    strcpy(ach, "__gg__upsi");
> > > >      }
> > > >
> > > >    new_var->var_decl_node =
> gg_declare_variable(cblc_field_type_node,
> > > ach,
> > > > NULL, vs_external_reference);
> > > > @@ -16156,6 +16167,10 @@ psa_FldLiteralA(struct cbl_field_t *field )
> > > >                  field->data.initial,
> > > >                  NULL_TREE,
> > > >                  field->var_decl_node);
> > > > +    TREE_READONLY(field->var_decl_node) = 1;
> > > > +    TREE_USED(field->var_decl_node) = 1;
> > > > +    TREE_STATIC(field->var_decl_node) = 1;
> > > > +    DECL_PRESERVE_P (field->var_decl_node) = 1;
> > > >      nvar += 1;
> > > >      }
> > > >    TRACE1
> > > > diff --git a/gcc/cobol/gengen.cc b/gcc/cobol/gengen.cc
> > > > index ffb64c8993d..e7a4e3c5165 100644
> > > > --- a/gcc/cobol/gengen.cc
> > > > +++ b/gcc/cobol/gengen.cc
> > > > @@ -375,6 +375,10 @@ show_type(tree type)
> > > >    static char ach[1024];
> > > >    switch( TREE_CODE(type) )
> > > >      {
> > > > +    case POINTER_TYPE:
> > > > +      sprintf(ach, "POINTER");
> > > > +      break;
> > > > +
> > > >      case VOID_TYPE:
> > > >        sprintf(ach, "VOID");
> > > >        break;
> > > > @@ -2548,6 +2552,10 @@ gg_define_function_with_no_parameters(tree
> > > > return_type,
> > > >      DECL_CONTEXT (function_decl) = gg_trans_unit.trans_unit_decl;
> > > >      TREE_PUBLIC(function_decl) = 0;
> > > >
> > > > +    // This function is file static, but nobody calls it, so
> without
> > > > +    // intervention -O1+ optimizations will discard it.
> > > > +    DECL_PRESERVE_P (function_decl) = 1;
> > > > +
> > > >      // Append this function to the list of functions and variables
> > > >      // associated with the computation module.
> > > >      gg_append_var_decl(function_decl);
> > > > @@ -3358,8 +3366,8 @@ gg_array_of_size_t( size_t N, size_t *values)
> > > >  tree
> > > >  gg_array_of_bytes( size_t N, unsigned char *values)
> > > >    {
> > > > -  tree retval = gg_define_variable(build_pointer_type(UCHAR));
> > > > -  gg_assign(retval, gg_cast(build_pointer_type(UCHAR), gg_malloc(
> > > > build_int_cst_type(UCHAR, N * sizeof(unsigned char)))));
> > > > +  tree retval = gg_define_variable(UCHAR_P);
> > > > +  gg_assign(retval, gg_cast(UCHAR_P, gg_malloc(
> > > > build_int_cst_type(SIZE_T, N * sizeof(unsigned char)))));
> > > >    for(size_t i=0; i<N; i++)
> > > >      {
> > > >      gg_assign(gg_array_value(retval, i), build_int_cst_type(UCHAR,
> > > > values[i]));
> > > > diff --git a/libgcobol/charmaps.h b/libgcobol/charmaps.h
> > > > index 12968fdf928..6b4e9f5c4b4 100644
> > > > --- a/libgcobol/charmaps.h
> > > > +++ b/libgcobol/charmaps.h
> > > > @@ -297,7 +297,7 @@ extern unsigned char __gg__data_zeros[1]       ;
> > > >  extern unsigned char __gg__data_high_values[1] ;
> > > >  extern unsigned char __gg__data_quotes[1]      ;
> > > >  extern unsigned char __gg__data_upsi_0[2]      ;
> > > > -extern unsigned char __gg__data_return_code[2] ;
> > > > +extern short         __gg__data_return_code    ;
> > > >
> > > >  // These are the various hardcoded tables used for conversions.
> > > >  extern const unsigned short __gg__one_to_one_values[256];
> > > > diff --git a/libgcobol/constants.cc b/libgcobol/constants.cc
> > > > index 026f919cacc..d37c791f1b3 100644
> > > > --- a/libgcobol/constants.cc
> > > > +++ b/libgcobol/constants.cc
> > > > @@ -288,7 +288,7 @@ struct cblc_field_t __gg___14_linage_counter6 =
> {
> > > >
> > > >
> > > >  unsigned char __gg__data_upsi_0[2] = {0,0};
> > > > -struct cblc_field_t __gg___6_upsi_04 = {
> > > > +struct cblc_field_t __gg__upsi = {
> > > >    .data           = __gg__data_upsi_0 ,
> > > >    .capacity       = 2 ,
> > > >    .allocated      = 2 ,
> > > > @@ -307,9 +307,9 @@ struct cblc_field_t __gg___6_upsi_04 = {
> > > >    .dummy          = 0 ,
> > > >    };
> > > >
> > > > -unsigned char __gg__data_return_code[2] = {0,0};
> > > > -struct cblc_field_t __gg___11_return_code6 = {
> > > > -  .data           = __gg__data_return_code ,
> > > > +short __gg__data_return_code = 0;
> > > > +struct cblc_field_t __gg__return_code = {
> > > > +  .data           = (unsigned char *)&__gg__data_return_code ,
> > > >    .capacity       = 2 ,
> > > >    .allocated      = 2 ,
> > > >    .offset         = 0 ,
> > > > @@ -319,7 +319,7 @@ struct cblc_field_t __gg___11_return_code6 = {
> > > >    .parent         = NULL,
> > > >    .occurs_lower   = 0 ,
> > > >    .occurs_upper   = 0 ,
> > > > -  .attr           = 0x0 ,
> > > > +  .attr           = signable_e ,
> > > >    .type           = FldNumericBin5 ,
> > > >    .level          = 0 ,
> > > >    .digits         = 4 ,

Reply via email to