> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: Monday, April 7, 2025 03:19
> To: Robert Dubner <rdub...@symas.com>
> Cc: Sam James <s...@gentoo.org>; GCC Patches <gcc-patches@gcc.gnu.org>;
> Richard Biener <rguent...@suse.de>
> Subject: Re: [committed] cobol: Eliminate cobolworx UAT errors when
> compiling with -Os
>
> On Fri, Apr 4, 2025 at 11:50 PM Robert Dubner <rdub...@symas.com> wrote:
> >
> > Anybody who might have gotten interested should stand down.
> >
> > As usual, that analysis got me thinking.
> >
> > I got focused on where var_decl_return_code was being used. (I was
> wrong.
> > I made the mistake because I had just eliminated two sets of errors
> caused
> > by the optimization actually optimizing away things I need, so I had
> that
> > in the front of my brain.) Richard told me of something odd at the
> point
> > where var_decl_return was being established. I finally decided to look
> at
> > that.
> >
> > Turned out, ultimately, to be a SHORT / USHORT mismatch on the variables
> > being given to a MODIFY_EXPR. Apparently the optimization algorithms
> can
> > be extremely cranky about value types.
>
> Yes. Unless you implement the GET_ALIAS_SET language hook even more
> so than when using C. But I expect COBOL to be a quite restrictive
> language
> with respect to typing - is there something like pointers so a COBOL
> program
> can access the bit-representation of some variable as a different type?
That kind of morphing can be done through pointers; COBOL has "GET ADDRESS
OF", and the resulting pointer can be assigned to an arbitrary variable
that's defined as having its address in changeable memory.
But even more so, COBOL has the frequently-used DEFINES clause, which
enables you do to pretty much anything.
COBOL variables are usually hierarchical, so you can do perverse things like
this:
01 fubar.
02 as-string PIC X(4).
02 as-number REDEFINES as-string BINARY-LONG.
The result is identical to a C union:
union
{
char as_string[4];
uint32_t as_number;
} fubar;
This is just as dangerous as it looks in either language, especially since
the target can be big- or little-endian, with all the nightmares that adds
as the int swaps ends while the string doesn't.
That said: At the present time, as I keep saying, the code I generate is
using GENERIC as a kind of assembly language. All those cobol variables are
structures, and I am manipulating the elements of those structures. So,
when I say I had a SHORT/USHORT mismatch, that was my doing; I created those
variables in order to implement something with those cobol variable
structures -- and I messed up.
I have expanded the type checking in the routine that creates MODIFY_EXPR
trees. If the middle-end can be cranky, I can, too. But I am throwing an
error rather than crashing.
>
> Richard.
>
> >
> > In any event, with that straightened out, everything is working without
> > the flag_strict_aliasing modification.
> >
> > Thanks for asking, and thanks for listening.
> >
> > > -----Original Message-----
> > > From: Robert Dubner <rdub...@symas.com>
> > > Sent: Friday, April 4, 2025 16:02
> > > To: Sam James <s...@gentoo.org>
> > > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > > Subject: RE: [committed] cobol: Eliminate cobolworx UAT errors when
> > > compiling with -Os
> > >
> > > > -----Original Message-----
> > > > From: Sam James <mailto:s...@gentoo.org>
> > > > Sent: Friday, April 4, 2025 14:28
> > > > To: Robert Dubner <mailto:rdub...@symas.com>
> > > > Cc: 'GCC Patches' <mailto:gcc-patches@gcc.gnu.org>
> > > > Subject: Re: [committed] cobol: Eliminate cobolworx UAT errors when
> > > > compiling with -Os
> > > >
> > > > Robert Dubner <mailto:rdub...@symas.com> writes:
> > > >
> > > > > From e70fe5ed46ab129a8b1da961c47d3fb75b11b988 Mon Sep 17 00:00:00
> > 2001
> > > > > From: Bob Dubner mailto:rdub...@symas.com
> > > > > Date: Fri, 4 Apr 2025 13:48:58 -0400
> > > > > Subject: [PATCH] cobol: Eliminate cobolworx UAT errors when
> > compiling
> > > > with
> > > > > -Os
> > > > >
> > > > > Testcases compiled with -Os were failing because static functions
> > and
> > > > > static
> > > > > variables were being optimized away, because of improper data type
> > > > casts,
> > > > > and
> > > > > because strict aliasing (whatever that is) was resulting in some
> > loss
> > > > > of
> > > >
> > > > Are you unfamiliar with that from C and C++? See
> > > > https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 (we
> > all
> > > > have our favourite documents to explain it) but I don't know if
> COBOL
> > is
> > > > amenable to the concept of TBAA.
> > > >
> > > > > data.
> > > > > These changes eliminate those known problems.
> > > >
> > > > I'd suggest that this should be accompanied by some question,
> > otherwise
> > > > it's going to live there forever and it's not necessarily right
> > (though
> > > > see above - if COBOL is incompatible with the idea, it might need
> > > > something along those lines, though not sure this is the right way
> of
> > > > doing that).
> > > >
> > > > That is, while you're free to approve your own COBOL patches, you're
> > > > also free to CC others and ask them for advice even if not explicit
> > > > approval before pushing them if something doesn't seem to be correct
> > or
> > > > is a hack.
> > >
> > > I am not any kind of compiler expert, except, possibly in one way: I
> > don't
> > > trust them. I have never trusted them. I don't trust them, and I
> don't
> > > trust the computers they run on. I regard compilers the same way a
> > > lion-tamer regards the big cats they work with every day. I treat
> them
> > > with firm respect; I expect them to behave the way they've been
> trained
> > --
> > > and I never turn my back on them. So, no, I don't understand TBAA; I
> > had
> > > to look it up. I don't understand aliasing.
> > >
> > > The problem at hand is not a COBOL problem. It is a Bob problem.
> > Richard
> > > pointed me at the original root of the thing. I believe I have
> > addressed
> > > that. But the problem has not gone away. The "flag_strict_aliasing =
> > 0;"
> > > solution makes the symptoms go away. I spent a couple of hours
> messing
> > > with this, and I was unable to resolve it. So I have shrugged and
> used
> > > what I have.
> > >
> > > You probably have figured out by now that when I need help, it's
> because
> > > 1) I have missed something stupid, 2) I was just plain ignorant, or 3)
> > > It's a hard problem.
> > >
> > > And because it's a hard problem, it's hard to describe. But you sort
> of
> > > asked, so I will sort of try to explain what's going on.
> > >
> > > IBM-flavored COBOL has the concept of a RETURN-CODE, a global 16-bit
> > > integer that is shared by all PROGRAM-ID modules. (Each is
> implemented
> > as
> > > a C type function.)
> > >
> > > In GCOBOL, a COBOL variable is actually a structure with a data area,
> > > because COBOL variables have lots of metadata associated with them,
> > > including the type of storage, the amount of storage, in many cases
> the
> > > output format of the variable, whether or not they are signed, offsets
> > > from parent, number of digits, number of decimal places, and more. To
> > > make life easier, we implement RETURN-CODE as a COBOL variable, but
> its
> > > data area is a global 16-bit "short" defined in the libgcobol.so.
> > >
> > > So: Here we go:
> > >
> > > In libgcobol/constants.cc:
> > > short __gg__data_return_code = 0;
> > >
> > > In libgcobol/charmaps.h:
> > > extern short __gg__data_return_code ;
> > >
> > > In gcc/cobol/genapi.cc, I need to be able to generate GENERIC that
> > > accesses that variable.
> > >
> > > So, in gcc/genutil.h:
> > > extern tree var_decl_return_code; // short __gg__data_return_code
> > >
> > > In gcc/cobol/genutil.cc:
> > > tree var_decl_return_code; // short __gg__data_return_code
> > >
> > > In gcc/cobol/genapicc:
> > > SET_VAR_DECL(var_decl_return_code, SHORT,
> > "__gg__data_return_code");
> > >
> > > That macro results in
> > > var_decl_return_code = gg_declare_variable(
> > > SHORT,
> > > "__gg__data_return_code",
> > > NULL_TREE, // Initial value
> > > vs_external_reference)
> > >
> > > SHORT is defined in gcc/cobol/gengen.h
> > > #define SHORT short_integer_type_node
> > >
> > > The gg_declare_variable() function is in gengen.cc. It creates a
> > var_decl
> > > for a SHORT with the name "__gg__data_return_code", and these
> > attributes:
> > >
> > > DECL_CONTEXT (var_decl) = gg_trans_unit.trans_unit_decl;
> > > TREE_USED(var_decl) = 1;
> > > DECL_EXTERNAL (var_decl) = 1;
> > > TREE_PUBLIC(var_decl) = 1;
> > >
> > > With that all in place, in genapi.cc we turn our attention to the
> > > parser_see_stop_run() routine. That routine uses a static integer
> named
> > > "..pssr_retval":
> > >
> > > static tree returned_value = gg_define_variable(INT, "..pssr_retval",
> > > vs_file_static);
> > >
> > > That variable gets set thusly:
> > >
> > > gg_assign(returned_value, gg_cast(INT, var_decl_return_code));
> > >
> > > (gg_assign() creates a MODIFY_EXPR, gg_cast does "return
> > > fold_convert(type, var);"
> > >
> > > and that variable gets used like this:
> > >
> > > gg_exit(returned_value);
> > >
> > > gg_exit is found in gengen.cc:
> > >
> > > void
> > > gg_exit(tree exit_code)
> > > {
> > > tree the_call =
> > > build_call_expr_loc(location_from_lineno(),
> > > builtin_decl_explicit (BUILT_IN_EXIT),
> > > 1,
> > > exit_code);
> > > gg_append_statement(the_call);
> > > }
> > >
> > > So, we have the stage set. When the prior version of GCOBOL (without
> the
> > > flag_strict_aliasing=0) compiles this program
> > >
> > > PROGRAM-ID. PROG.
> > > PROCEDURE DIVISION.
> > > MOVE 1 TO RETURN-CODE
> > > STOP RUN.
> > >
> > > with -O0 or -O1, the executable terminates with 1.
> > >
> > > With -O2, -O3, or -Os, the executable terminates with 0.
> > >
> > > It was at this point that I gave up. As far as I can tell I am
> creating
> > a
> > > signed short, I am picking it up, I am casting it to a signed int, and
> I
> > > am passing that to exit().
> > >
> > > I don't know what else to try.
> > >
> > > With flag_strict_aliasing=0, it works. Without it, it doesn't.
> > >
> > > I dislike needing that flag_strict_aliasing=0; it feels a bit like
> > turning
> > > my back to a tiger. But here we are.
> > >
> > > Remember: You asked. And thank you for asking. And if anybody can
> offer
> > > any insight as to what's going on here, it will be received with great
> > > enthusiasm.
> > >
> > >
> > > >
> > > > >
> > > > > gcc/cobol
> > > > >
> > > > > * cobol1.cc: (cobol_langhook_post_options): Implemented in order
> > > > > to set
> > > > > flag_strict_aliasing to zero.
> > > > > * genapi.cc: (set_user_status): Add comment.
> > > > > (parser_intrinsic_subst): Expand SHOW_PARSE information.
> > > > > (psa_global): Change names of return-code and upsi globals,
> > > > > (psa_FldLiteralA): Set DECL_PRESERVE_P for FldLiteralA.
> > > > > * gengen.cc: (show_type): Add POINTER type.
> > > > > (gg_define_function_with_no_parameters): Set DECL_PRESERVE_P for
> > > > > COBOL-
> > > > > style nested programs. (gg_array_of_bytes): Fix bad cast.
> > > > >
> > > > > libgcobol
> > > > >
> > > > > * charmaps.h: Change __gg__data_return_code to 'short' type.
> > > > > * constants.cc: Likewise.
> > > > > ---
> > > > > gcc/cobol/cobol1.cc | 19 +++++++++++++++++++
> > > > > gcc/cobol/genapi.cc | 19 +++++++++++++++++--
> > > > > gcc/cobol/gengen.cc | 12 ++++++++++--
> > > > > libgcobol/charmaps.h | 2 +-
> > > > > libgcobol/constants.cc | 10 +++++-----
> > > > > 5 files changed, 52 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/gcc/cobol/cobol1.cc b/gcc/cobol/cobol1.cc
> > > > > index 0d07c460d41..d175ab11e3f 100644
> > > > > --- a/gcc/cobol/cobol1.cc
> > > > > +++ b/gcc/cobol/cobol1.cc
> > > > > @@ -646,6 +646,22 @@ cobol_get_sarif_source_language(const char *)
> > > > > return "cobol";
> > > > > }
> > > > >
> > > > > +bool
> > > > > +cobol_langhook_post_options(const char**)
> > > > > + {
> > > > > + // This flag, when set to 0, results in calls to gg_exit
> working
> > > > > properly.
> > > >
> > > > Comments like this usually have a ??? or similar.
> > > >
> > > > > + // I don't know why it is necessary. There is something going
> on
> > > > with
> > > > > the
> > > > > + // definition of __gg__data_return_code in constants.cc, and
> > with
> > > > how
> > > > > it
> > > > > + // is used through var_decl_return_code in genapi.cc. Without
> > it,
> > > > the
> > > > > value
> > > > > + // delivered to exit@PLT is zero, and not
> __gg__data_return_code
> > > > > + // Dubner, 2025-04-04.
> > > > > + flag_strict_aliasing = 0;
> > > > > +
> > > > > + /* Returning false means that the backend should be used. */
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > +
> > > > > #undef LANG_HOOKS_BUILTIN_FUNCTION
> > > > > #undef LANG_HOOKS_GETDECLS
> > > > > #undef LANG_HOOKS_GLOBAL_BINDINGS_P
> > > > > @@ -660,6 +676,7 @@ cobol_get_sarif_source_language(const char *)
> > > > > ////#undef LANG_HOOKS_TYPE_FOR_SIZE
> > > > > #undef LANG_HOOKS_SET_DECL_ASSEMBLER_NAME
> > > > > #undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > > > > +#undef LANG_HOOKS_POST_OPTIONS
> > > > >
> > > > > // We use GCC in the name, not GNU, as others do,
> > > > > // because "GnuCOBOL" refers to a different GNU project.
> > > > > @@ -685,6 +702,8 @@ cobol_get_sarif_source_language(const char *)
> > > > >
> > > > > #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
> > > > > cobol_get_sarif_source_language
> > > > >
> > > > > +#define LANG_HOOKS_POST_OPTIONS cobol_langhook_post_options
> > > > > +
> > > > > struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> > > > >
> > > > > #include "gt-cobol-cobol1.h"
> > > > > diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
> > > > > index a0da6476e2a..fbe0bbc75dc 100644
> > > > > --- a/gcc/cobol/genapi.cc
> > > > > +++ b/gcc/cobol/genapi.cc
> > > > > @@ -8806,6 +8806,10 @@ static
> > > > > void set_user_status(struct cbl_file_t *file)
> > > > > {
> > > > > // This routine sets the user_status, if any, to the
> > > > > cblc_file_t::status
> > > > > +
> > > > > + // We have to do it this way, because in the case where the
> > > > > file->user_status
> > > > > + // is in linkage, the memory addresses can end up pointing to
> the
> > > > wrong
> > > > > + // places
> > > > > if(file->user_status)
> > > > > {
> > > > > cbl_field_t *user_status =
> > > > > cbl_field_of(symbol_at(file->user_status));
> > > > > @@ -10111,6 +10115,13 @@ parser_intrinsic_subst( cbl_field_t *f,
> > > > > SHOW_PARSE
> > > > > {
> > > > > SHOW_PARSE_HEADER
> > > > > + SHOW_PARSE_FIELD(" TO ", f)
> > > > > + for(size_t i=0; i<argc; i++)
> > > > > + {
> > > > > + SHOW_PARSE_INDENT
> > > > > + SHOW_PARSE_FIELD(" ", argv[i].orig.field)
> > > > > + SHOW_PARSE_FIELD(" ", argv[i].replacement.field)
> > > > > + }
> > > > > SHOW_PARSE_END
> > > > > }
> > > > > TRACE1
> > > > > @@ -15908,12 +15919,12 @@ psa_global(cbl_field_t *new_var)
> > > > >
> > > > > if( strcmp(new_var->name, "RETURN-CODE") == 0 )
> > > > > {
> > > > > - strcpy(ach, "__gg___11_return_code6");
> > > > > + strcpy(ach, "__gg__return_code");
> > > > > }
> > > > >
> > > > > if( strcmp(new_var->name, "UPSI-0") == 0 )
> > > > > {
> > > > > - strcpy(ach, "__gg___6_upsi_04");
> > > > > + strcpy(ach, "__gg__upsi");
> > > > > }
> > > > >
> > > > > new_var->var_decl_node =
> > gg_declare_variable(cblc_field_type_node,
> > > > ach,
> > > > > NULL, vs_external_reference);
> > > > > @@ -16156,6 +16167,10 @@ psa_FldLiteralA(struct cbl_field_t *field
> )
> > > > > field->data.initial,
> > > > > NULL_TREE,
> > > > > field->var_decl_node);
> > > > > + TREE_READONLY(field->var_decl_node) = 1;
> > > > > + TREE_USED(field->var_decl_node) = 1;
> > > > > + TREE_STATIC(field->var_decl_node) = 1;
> > > > > + DECL_PRESERVE_P (field->var_decl_node) = 1;
> > > > > nvar += 1;
> > > > > }
> > > > > TRACE1
> > > > > diff --git a/gcc/cobol/gengen.cc b/gcc/cobol/gengen.cc
> > > > > index ffb64c8993d..e7a4e3c5165 100644
> > > > > --- a/gcc/cobol/gengen.cc
> > > > > +++ b/gcc/cobol/gengen.cc
> > > > > @@ -375,6 +375,10 @@ show_type(tree type)
> > > > > static char ach[1024];
> > > > > switch( TREE_CODE(type) )
> > > > > {
> > > > > + case POINTER_TYPE:
> > > > > + sprintf(ach, "POINTER");
> > > > > + break;
> > > > > +
> > > > > case VOID_TYPE:
> > > > > sprintf(ach, "VOID");
> > > > > break;
> > > > > @@ -2548,6 +2552,10 @@ gg_define_function_with_no_parameters(tree
> > > > > return_type,
> > > > > DECL_CONTEXT (function_decl) = gg_trans_unit.trans_unit_decl;
> > > > > TREE_PUBLIC(function_decl) = 0;
> > > > >
> > > > > + // This function is file static, but nobody calls it, so
> > without
> > > > > + // intervention -O1+ optimizations will discard it.
> > > > > + DECL_PRESERVE_P (function_decl) = 1;
> > > > > +
> > > > > // Append this function to the list of functions and
> variables
> > > > > // associated with the computation module.
> > > > > gg_append_var_decl(function_decl);
> > > > > @@ -3358,8 +3366,8 @@ gg_array_of_size_t( size_t N, size_t
> *values)
> > > > > tree
> > > > > gg_array_of_bytes( size_t N, unsigned char *values)
> > > > > {
> > > > > - tree retval = gg_define_variable(build_pointer_type(UCHAR));
> > > > > - gg_assign(retval, gg_cast(build_pointer_type(UCHAR), gg_malloc(
> > > > > build_int_cst_type(UCHAR, N * sizeof(unsigned char)))));
> > > > > + tree retval = gg_define_variable(UCHAR_P);
> > > > > + gg_assign(retval, gg_cast(UCHAR_P, gg_malloc(
> > > > > build_int_cst_type(SIZE_T, N * sizeof(unsigned char)))));
> > > > > for(size_t i=0; i<N; i++)
> > > > > {
> > > > > gg_assign(gg_array_value(retval, i),
> build_int_cst_type(UCHAR,
> > > > > values[i]));
> > > > > diff --git a/libgcobol/charmaps.h b/libgcobol/charmaps.h
> > > > > index 12968fdf928..6b4e9f5c4b4 100644
> > > > > --- a/libgcobol/charmaps.h
> > > > > +++ b/libgcobol/charmaps.h
> > > > > @@ -297,7 +297,7 @@ extern unsigned char __gg__data_zeros[1]
> ;
> > > > > extern unsigned char __gg__data_high_values[1] ;
> > > > > extern unsigned char __gg__data_quotes[1] ;
> > > > > extern unsigned char __gg__data_upsi_0[2] ;
> > > > > -extern unsigned char __gg__data_return_code[2] ;
> > > > > +extern short __gg__data_return_code ;
> > > > >
> > > > > // These are the various hardcoded tables used for conversions.
> > > > > extern const unsigned short __gg__one_to_one_values[256];
> > > > > diff --git a/libgcobol/constants.cc b/libgcobol/constants.cc
> > > > > index 026f919cacc..d37c791f1b3 100644
> > > > > --- a/libgcobol/constants.cc
> > > > > +++ b/libgcobol/constants.cc
> > > > > @@ -288,7 +288,7 @@ struct cblc_field_t __gg___14_linage_counter6
> =
> > {
> > > > >
> > > > >
> > > > > unsigned char __gg__data_upsi_0[2] = {0,0};
> > > > > -struct cblc_field_t __gg___6_upsi_04 = {
> > > > > +struct cblc_field_t __gg__upsi = {
> > > > > .data = __gg__data_upsi_0 ,
> > > > > .capacity = 2 ,
> > > > > .allocated = 2 ,
> > > > > @@ -307,9 +307,9 @@ struct cblc_field_t __gg___6_upsi_04 = {
> > > > > .dummy = 0 ,
> > > > > };
> > > > >
> > > > > -unsigned char __gg__data_return_code[2] = {0,0};
> > > > > -struct cblc_field_t __gg___11_return_code6 = {
> > > > > - .data = __gg__data_return_code ,
> > > > > +short __gg__data_return_code = 0;
> > > > > +struct cblc_field_t __gg__return_code = {
> > > > > + .data = (unsigned char *)&__gg__data_return_code ,
> > > > > .capacity = 2 ,
> > > > > .allocated = 2 ,
> > > > > .offset = 0 ,
> > > > > @@ -319,7 +319,7 @@ struct cblc_field_t __gg___11_return_code6 = {
> > > > > .parent = NULL,
> > > > > .occurs_lower = 0 ,
> > > > > .occurs_upper = 0 ,
> > > > > - .attr = 0x0 ,
> > > > > + .attr = signable_e ,
> > > > > .type = FldNumericBin5 ,
> > > > > .level = 0 ,
> > > > > .digits = 4 ,