On Mon, 23 Dec 2024, Robert Dubner wrote:

> Richard, a bunch of things you address are in my bailwick.
> 
> When Jim and I set out to create a COBOL front end, I knew *NOTHING*
> about, well, anything vis-à-vis GCC.  I barely knew how it worked.  Some
> things I had to figure out even before I knew how to figure anything outl
> notably, creating functions.  Creating a function was practically the
> first thing I had to do, and I spent a bunch of time trapping through GCC
> compiling simple functions to devine how it might be done.  By definition,
> I simply did not know what I was doing, and learned as I went.

I guess that's expected - we always hope people doing new frontends
have spare time left to fill gaps in documentation with knowledge
they gained ;)  Or maybe write a blog post about how to do a new
GCC frontend (there are multiple such for backends).  But I know time
is scarce.

> So, if there are peculiarities, that's probably why.  And I am going to
> eagerly go through this message, and others, to apply the knowledge you
> are conveying; I am in a far better position to understand what you are
> saying than I would have been three years ago.



> 
> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Sunday, December 22, 2024 07:18
> > To: jklow...@symas.com
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] COBOL 3/8 gen: GENERIC interface
> >
> >
> > (sorry for breaking threading, but quoting the whole mail made my MUA
> > unbearably slow)
> >
> > From 64bcb34e12371f61a8958645e1668e0ac2704391gen.patch 4 Oct 2024
> 12:01:22
> > -0400
> > From: "James K. Lowden" <jklow...@symas.com>
> > Date: Thu 12 Dec 2024 06:28:07 PM EST
> > Subject: [PATCH]  Add 'cobol' to 4 files
> >
> > gcc/cobol/ChangeLog
> >         * genapi.cc: New file.
> >         * gengen.cc: New file.
> >         * genmath.cc: New file.
> >         * genutil.cc: New file.
> >
> >
> > I'm skimming over this part of the frontend, taking notes on general
> > things I notice, partly quoting parts of the patch
> >
> > +#include <err.h>
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <ctype.h>
> > +#include <vector>
> > +#include <string>
> > +#include <unordered_map>
> > +#include <unordered_set>
> > +#include <chrono>
> > +#include <time.h>
> > +#include <math.h>
> > +
> > +#include "config.h"
> > +#include "system.h"
> >
> > all system headers need to be included by system.h, C++ includes should
> be
> > included by defining INCLUDE_UNORDERED_MAP and INCLUDE_UNORDERED_SET,
> etc.
> > - includes definitely have to come after including config.h
> 
> I now see that many of the front ends seem to have their own lang-system.h
> that wraps the gcc/system.h.  Under the assumption that the recently-added
> GO and RUST are doing that in an acceptable fashion, I will model that
> behavior.

Sounds good to me.

> >
> > +//#define XXX do{gg_printf("LINE %d\n", build_int_cst_type(INT,
> > __LINE__), NULL_TREE);}while(0);
> > +#define XXX
> > ...
> > +bool bSHOW_PARSE = getenv("SHOW_PARSE");
> >
> > there seems to be debugging wired in - at least the getenv call above
> > should probably be be behind a more global COBOL_DEBUG conditional?
> > There's a langhook for frontend initialization - that's a convenient
> place
> > to conditionally (re-)initialize bSHOW_PARSE (when you default it to
> > false).
> >
> > +  // One strongly suspects, one does, that creating the constructor one
> > + // character at a time isn't the best possible way of accomplishing
> > + the  // desired goal. But I lost interest in trying to find a better
> > way.
> > +  TYPE_NAME(array_of_characters) = get_identifier("cblc_string");  tree
> > + constr = make_node(CONSTRUCTOR);
> > +  TREE_TYPE(constr) = array_of_characters;
> >
> > indeed, you should be able to use build_string (strlen(var_contents)+1,
> > var_contents) and use that as DECL_INTIIAL (it builds a STRING_CST
> node).
> 
> That worked, thank you.
> 
> >
> > +tree
> > +gg_fprintf(tree fd, int nargs, const char *format_string, ...)
> > +  {
> > ...
> > +  tree stmt = build_call_array_loc (location_from_lineno(),
> > +                                    INT,
> > +                                    function,
> > +                                    argc,
> > +                                    args);
> >
> > instead of constructing args[] you should be able to directly call
> > build_call_valist () with the va_list.
> >
> 
> In this case, I don't see how to use build_call_valist ().
> 
> gg_fprintf() takes a format_string and 'nargs' following arguments. It
> then builds a call to
> 
> sprintf (buffer, format_string, <those nargs variables);
> 
> I don't see how to pass 'format string' to build_call_valist ().

Ah, true ... it isn't possible to expand an existing va_list.
I suppose exporting build_call_1 or simply open-coding what it does
would allow to elide the local array though (it's mainly the
artificial limit that had bugged me)

> > You seem to, in function_decl_from_name or gg_get_function_address,
> build
> > function decls for externs, mainly from the C library.  Elsewhere we are
> > using what GCC calls "built-ins" for this, see gcc/builtins.def and
> > tree.h:builtin_decl_explicit.  Unfortunately most builtins and the used
> > types are to be initialized by frontends, luckily some are from the
> > middle-end, see build_common_builtin_nodes - that includes
> BUILT_IN_MEMCPY
> > and friends.  For optimization purposes (that GCC knows that "memcpy" is
> > memcpy) I suggest you see to properly declare
> > BUILT_IN_* declarations - the Fortran frontend might be a good recipie
> to
> > copy from (see fortran/f95-lang.cc around where it includes
> > mathbuiltins.def).
> 
> I will learn about builtins.
> 
> 
> >
> > +    if( !field->var_decl_node )
> > +      {
> > +      fprintf(  stderr,
> > +                "%s (type: %s) improperly has a NULL var_decl_node\n",
> > +                field->name,
> > +                cbl_field_type_str(field->type));
> > +      fprintf(  stderr,
> > +                "Probable cause: it was referenced without being
> > defined.\n");
> > +      exit(1);
> >
> > With GCC diagnostics you'd use internal_error (...) which then
> terminates
> > GCC.
> >
> > +static void
> > +assembler_label(const char *label, bool global = false)
> > +  {
> > +  // label has to be a valid label for the assembler
> > +
> > +  // use the global switch for COBOL ENTRY
> > +
> > +  const char global_text[] = ": .global ";
> >
> > there's target macros for how assemblers work, like ASM_OUTPUT_LABEL,
> but
> > it seems this function is about sections?  I'm not sure how
> > gg_insert_into_assembler "works", but it seems to emit a ASM_EXPR into
> the
> > IL - in the section_label case in the toplevel?  For portability I'd
> > strongly advise against the use of global assembler, you can always tell
> > the middle-end what you desire here, for sections you can use
> > set_decl_section_name for example.
> >
> > I'm not sure for what else you use ASM_EXPRs.
> 
> Hoo, boy.  We have a namespace collision.  The COBOL language -- which, I
> remind you, got there first -- uses "SECTION" and "PARAGRAPH" to mean very
> specific things that are part of the language specification. So, my use of
> the word "section" has nothing to do with the sections in an ELF
> executable.

Oh, I see.  I know nothing of COBOL.

> COBOL sections and paragraphs (a section is a group of paragraphs) are
> conceptually similar to C functions. Given a paragraph named FOO, you can
> PERFORM FOO and the group of sentences (yes, a paragraph is made up of
> sentences; I remind you that COBOL was originally designed to be readable
> by non-programmers) are executed and control then returns to the statement
> after the PERFORM.
> 
> For various reasons, execution into, through, and back from sections and
> paragraphs must be implemented with GOTO statements, and cannot be
> implemented with calls.

Uh, that's awkward (if not only for the fact that the big functions
you end up will be slow to compile).

> I nonetheless attempted, at one point, to implement PERFORM via calls, and
> the ".global" you noticed is a vestige of that effort.  The routine it was
> used in has a boolean variable 'global' that defaults to false, and the
> routine was never called with that parameter set to true.  It is
> unnecessary and has been deleted.
> 
> gg_insert_into_assembler() does indeed use ASM_EXPR. I sometimes use it to
> generate #-delimited comments into the generated assembly language so that
> I can see what's going on.
> 
> But I also use it to generate labeled locations in the executables.
> 
> I am also developing a GDB-COBOL version of GDB, one that understands the
> executables GCOBOL is generating.  We need the GDB NEXT instruction to
> execute through a PROC that is the subject of a PERFORM PROC statement.
> And I need to be able to set a breakpoint with "(gdb)break PROC".
> 
> I have not yet figured out how to use GCC to put information into the
> .debug_info section of a DWARF executable, and I have not yet figured out
> how, in GDB, to extract such information.  So, for now, I am creating
> labels with meta-information in them.

Hmm, I see.  I think that not using functions and function calls
and thus not having a "frame" will make this quite difficult.  Can you
elaborate on why having PARAGRAPH mapped to a function does not work?
I guess what I'd try is to have SECTION map to a function and have
PARAGRAPH map to nested functions within the SECTION - that way
the PARAGRAPHs have access to the variables at SECTION scope as they
are at the point of PERFORM (aka function calls).  In GNU C you can
do this:

void foo_section (int p)
{
  int a[10];
  int res;
  int i;

  i = 5;

  void bar_paragraph ()
  {
    res = a[i];
  }

  bar_paragraph ();
  if (res == 7)
    return;

  res += 5;

  void foobar_paragraph ()
  {
    a[1] = res;
  }

  foobar_paragraph ();
  bar_paragraph ();
}

I _think_ that this might map to how COBOL works?  gdb can deal with
nested functions.  You can also have goto inbetween the section
and paragraph in case the language allows this - GCC supports
"nonlocal" goto for this case.

> So, for example, for a program 'prog' with section 'mainsec' and paragraph
> 'graff', I use ASM_EXPR to create the label
> "_para.graff.mainsec.prog.0.79:"  Thus, when a user enters "break graff"
> or "break graff of mainsec", my code in GDB knows what to search for in
> order to know where to set the breakpoint.
> 
> I apologize for the lengthy description here.  Sometimes, however, it is
> unavoidable.  Sometimes, it is true, I do weird things out of ignorance.
> But sometimes I am doing weird things because COBOL is, well, (and
> certainly in the context of compilers and debuggers designed for C/C++),
> weird.
> 
> >
> > I noticed that you use build1 and friends sometimes without a location,
> > elsewhere there's location_from_lineno ().  Parts of GCC use the global
> > 'input_location' which you could set when sv_current_line_number
> changes?
> > GCC locations also track file (for C #include - not sure if relevant for
> > Cobol) and column.
> 
> I will look into 'input_location'.
> 
> >
> > +// These are globally useful constants
> > +tree char_nodes[256];
> > +
> > +tree integer_minusone_node;
> > +tree integer_two_node;
> > +tree integer_eight_node;
> > +tree size_t_zero_node;
> > +tree int128_zero_node;
> > +tree int128_five_node;
> > +tree int128_ten_node;
> >
> > you might sooner or later run into garbage collector issues, either
> > because you don't collect (so garbage piles up) or because you get
> things
> > collected that are still used (because it's not marked).  The above look
> > like they want to be marked GTY(()) (maybe they are in a header file).
> 
> They are not marked GTY(()).  I don't know what in tarnation GTY means,
> and I have never been able to figure it out.  And I don't know how, or
> when, to do garbage collection.
> 
> I am grateful for any hint.  (I have been aware, as I have been working,
> that I have been ignoring things that are probably important.  But I
> figured I could track down problems as they occurred, and if no problems
> showed up, well, as the old story goes, when it ain't rainin' the roof
> don't leak nohow.  I couldn't wait to understand GCC completely before
> writing code.)

OK, I guess you can defer this in case the frontend global state is
never looked at after the middle-end does its first garbage collection.
Usually such access might happen when the middle-end invokes
language-hooks - I didn't spot the patch those are introduced for
GNU Cobol yet.  You'd have to consider all frontend global state
that gets GC allocated (basically all 'tree' nodes) to be garbage
collected unless the roots (like those global vars above) are marked
to be as roots for the garbage collectors mark phase.  Grepping for
GTY in a frontend might turn up examples to copy from (also
Makefile bits to make gengtype generate additional headers and
including those).

> >
> > +    tree distance = build2( MULT_EXPR,
> > +                            SIZE_T,
> > +                            gg_cast(sizetype, offset),
> > +                            TYPE_SIZE_UNIT(element_type));
> >
> > if offset is a constant then the above builts the multiplication
> > expression, if you use fold_build2 (...) it would be simplified down to
> a
> > constant early.  With optimization it's going to be optimized anyway,
> but
> > as the multiplication isn't written by the user it might be tempting to
> > simplify it in the frontend.
> 
> Brand new to me; I will look into it.
> 
> >
> > +static tree
> > +gg_get_larger_type(tree A, tree B)
> > +  {
> > +  tree larger = TREE_TYPE(B);
> > +  if( TYPE_SIZE(TREE_TYPE(A)) > TYPE_SIZE(TREE_TYPE(B)) )
> > +    {
> > +    larger = TREE_TYPE(A);
> >
> > that doesn't work - TYPE_SIZE is a pointer to a tree.  You can use
> > tree_int_cst_compare (TYPE_SIZE(TREE_TYPE(A)),
> > TYPE_SIZE(TREE_TYPE(B))) == 1 instead.
> >
> > Maybe you are looking at arithmetic precision instead, in which case
> > TYPE_PRECISION (TREE_TYPE (A)) > TYPE_PRECISION (TREE_TYPE (B)) would be
> > appropriate (for example GCC supports int:3 and int:5 but they'd have
> both
> > TYPE_SIZE of 8 but precision of 3 and 5).
> >
> > Maybe Cobol doesn't have signed and unsigned types, iff it does it looks
> > like gg_get_larger_type might care for TYPE_UNSIGNED as well somehow.
> 
> Oi.  Here come many words.  I don't like answering with too many words,
> but sometimes answers are complicated and many words are required.
> 
> As we have implemented it, COBOL variables do not have "types" as GCC uses
> the term.  A COBOL variable is a 112-byte cblc_field_t structure.  It
> contains an enum that specifies the COBOL type (alphanumeric,
> floating-point, packed-decimal, numeric display (where numbers are stored
> as character strings of digits, e.g. "123"), binary, among other things).
> The structure also specifies the number of bytes, the number of digits,
> the number of digits to the right of a fixed-point decimal point.  There
> is a 64-bit integer value full of attribute flags: whether or not the
> value is signable, big-endian, little-endian.

Aha, so COBOL isn't statically typed?

> A true loony -- I am loony only when it's really useful -- would embark
> upon a quixotic religious quest to expand tree.def to encompass all that
> COBOL crap, so that we would have DEFTREECODE(COBOL_PACKED_DECIMAL_TYPE,
> "cobol_packed_decimal_type", tcc_type, 0) and go through the effort of
> having that TYPE be defined with the appropriate attributes, and then
> everything I've done would get absorbed into the middle-end which would
> spit out the assembly language to implement the language.
> 
> Anybody who would want to do it that way should be thrown into a pit.
> Anybody who would want to let them do it that way should be thrown in,
> too.
> 
> In general, most of the work in the executable is being done in the
> libgcobol.so code, written in C++.  When somebody wants to add a
> six-character value represented as a string of EBCDIC numerals with two
> digits to the right of a virtual decimal point, to a big-endian binary
> value that has four significant digits with and implied multiplier of
> 1000, and assign the result to a 12-digit packed decimal number with three
> digits to the right of an implied decimal point, there is an __gg__add()
> routine in the libgcobol.so that does all that.
> 
> Welcome to COBOL.  Everything I just described there is four lines of
> code:
> 77 A PIC  999V99  USAGE DISPLAY.
> 77 B PIC 9999PPP  USAGE BINARY
> 77 C PIC 9(9)V999 USAGE PACKED-DECIMAL.
> ADD A TO B GIVING C
> 
> So, a lot of what I do in the generated GENERIC is pass pointers to those
> cblc_field_t structures around.

OK, I see.  So in theory it would be nice for the compiler to "optimize"
those into CPU native "types"?  It might be interesting to see whether
using QUAL_UNION_TYPE would be a good interal representation - GCCs
types can have TYPE_SIZE being an expression dependent on fields of
the type, like the qualifier of a union.  While I know much the details
mostly escape me here - Eric Botcazou of Ada Core would be likely the
best go-to target for brain-storming.

> But!  There is a lot of work that I don't want to do in the library for
> performance reasons.  For example, if somebody defines a four-byte
> little-endian binary value that gets used as a loop counter, I don't want
> to be calling a library routine to decrement it, and another library
> routine to check if it's zero.

Right!

> So, I am generating GENERIC in a way that I regard as writing assembly
> language to handle some of the internals, like subtracting a numeric
> literal one from a four-byte little-endian binary.  I also hand-code stuff
> like calculating offsets into tables from the table subscripts.  I didn't
> want to put that into library code because critical inner loops often have
> table subscripts and I don't want the overhead.
> 
> The upshot is that some benchmark routines, that are heavily inner-loop
> dependent, run two to three times faster when compiled with GCOBOL than
> when compiled with some others.
> 
> With that all said: Much of the code you are referencing here isn't used
> directly in implementing user COBOL source.  It is rather used by me in
> generating the "assembly language" to do things like array offset
> calculations.
> 
> In any case, and given that gg_get_larger_type() is used by me in my code
> on variables I define, rather than on user code on variables the user
> defines, I believe it is doing what I need it to do.  For example, when I
> need to multiply tree A by tree B, I need them to both be of the same type
> in order to keep the TRUNC_DIV_EXPR from complaining that they aren't the
> same type.  So, I use gg_get_larger_type(A, B) to return the type of the
> one with the larger TYPE_SIZE, and then I cast both A and B to that type.
> 
> At least, that's what I think I am doing.  In my pile of index cards is
> the one that says, "Eliminate gg_get_larger_type(), and then fix all the
> resulting errors by defining the variables more sensibly in the first
> place."

Thanks for explaining.  I wonder if we can populate a minimalistic
dejagnu testsuite to have some coverage and COBOL examples people
could cut&paste and experiment with to both see the IL generated
and what optimizers can (or can not) do with it.

> >
> > +tree
> > +gg_build_logical_expression(tree operand_a,
> > +                            enum logop_t op,
> > +                            tree operand_b)
> > +  {
> > +  tree logical_expression = NULL_TREE;
> > +  tree_code logical_op;
> > +  switch(op)
> > +    {
> > +    case and_op:
> > +      logical_op = TRUTH_ANDIF_EXPR;
> > ...
> > +    case xor_op:
> > +      logical_op = TRUTH_XOR_EXPR;
> >
> > note TRUTH_ANDIF_EXPR and TRUTH_XOR_EXPR behave differently with respect
> > to evaluation of side-effects of the second operand.  There's no
> > TRUTH_XORIF_EXPR, iff XORIF semantics are desired you'd have to lower
> that
> > explicitly (or ask for the middle-end to adopt a TRUTH_XORIF_EXPR).
> >
> > +__int128
> > +get_power_of_ten(int n)
> > +  {
> >
> > I think it was already mentioned - you eventually want to use wide_int
> (or
> > a fixed_wide_int_storage specialization), or for truly arbitrary
> precision
> > integers use gmp (the Fortran frontend does this).  The use of __int128
> > restricts you to a subset of hosts the Cobol compiler can run on.
> 
> I will look into wide_int.  I have been reluctant to use gmp simply
> because of the increased complexity, a belief that, over time, the number
> of phone calls from users is directly proportional to the number of
> prerequisite libraries, and because it wasn't that many months ago that
> adding 1 to 2 and getting 3 was justification for breaking out the
> champagne.  If gmp gets us more architectures without too much of a hit on
> performance, we'll switch over.
> 
> >
> >
> > That's it for a skim through this large patch.  I'm curious whether you
> > tried configuring with --enable-checking=yes (the default on trunk) and
> if
> > you tried to use LTO with Cobol yet?
> > I also wonder about the DWARF generated with -g
> 
> In my debug builds, I have "--enable-checking", which I see is the same as
> "enable-checking=yes". In our release builds we have,
> "--enable-checking=release".

Sounds good then.

> I have not tried to use LTO.
> 
> The DWARF is working nicely with the companion GDB-COBOL on code compiled
> with '-ggdb -O0".  The GDB folks are, at best, dimly aware that I am
> rummaging through their code.  I will use the occasion of a patch into GCC
> as the starting gun for begging their forgiveness and enlisting their aid
> incorporating my stuff.
> 
> You can use ordinary GDB to set breakpoints on lines and then step through
> code created by GCOBOL. "print" won't work, however.

Well, sounds like a start at least ;)  'print' will need something
like a type (or a way to retrieve it for not statically typed languages),
and also of course the set of variables and lexical scopes - not sure
what's missing.

> In response to some other comments from Joseph Meyers, I will report here
> that I did some major surgery on the structure of the code.  I have rigged
> things up so that libgcobol compiles on its own; it has no prerequisites
> in gcc/cobol.  The gcc/cobol code does, of course, rely on libgcobol/ .h
> files.  There are also two code modules that are compiled into both
> libgcobol and gcc/cobol. (There are some tables for generating messages,
> and a bunch of code for dealing with EBCDIC versus ASCII that's used both
> at compile time and at run time.) Those source code modules now live in
> libgcobol.  The Make-lang.in file now makes copies of those two files and
> puts them into gcc/cobol.  So, libgcobol/charmaps.cc gets copied as
> gcc/cobol/charmaps-copy.cc; line 1, a //comment line, gets replaced with
> "// DO NOT EDIT THIS FILE.  It was copied from the libgcobol directory."
> 
> What this does to Jim's patching process, I don't know.  But I don't know
> how to respond to input from you experts without changing the code. The
> patching process will just have to catch up, I guess.
> 
> Thank you ever so much for your comments and advice.  It feels a little
> like coming out from the cold.

You're welcome - and sorry for the slow reply to your reply, holiday
season ...

Richard.

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to