RE: [PATCH] COBOL 3/8 gen: GENERIC interface

Richard Biener Thu, 09 Jan 2025 23:42:59 -0800

On Thu, 9 Jan 2025, Robert Dubner wrote:

> I am going to trim back some of the older stuff.
> 
> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Tuesday, January 7, 2025 08:32
> > To: Robert Dubner <rdub...@symas.com>
> > Cc: jklow...@symas.com; Joseph Myers <josmy...@redhat.com>; gcc-
> > patc...@gcc.gnu.org
> > Subject: RE: [PATCH] COBOL 3/8 gen: GENERIC interface
> >
> > On Mon, 23 Dec 2024, Robert Dubner wrote:
> >
> > > Richard, a bunch of things you address are in my bailwick.
> > >
> > > When Jim and I set out to create a COBOL front end, I knew *NOTHING*
> > > about, well, anything vis-à-vis GCC.  I barely knew how it worked.
> >
> > I guess that's expected - we always hope people doing new frontends have
> > spare time left to fill gaps in documentation with knowledge they gained
> > ;)  Or maybe write a blog post about how to do a new GCC frontend (there
> > are multiple such for backends).  But I know time is scarce.
> >
> 
> I don't think it's just time.  There are so many layers.  By analogy: I am
> imagining a simple machine that levitates a steel ball bearing with an
> electromagnet above and an optical sensor: sensor sees the bearing is too
> low, so more current is sent to the coil, which raises the bearing and so
> on.  Somebody wants to know how to build such a thing.  It's simple.  You
> just need to know how to build a sensor, and how to wind an electromagnet,
> and build an amplifier, and you need to know physics and feedback control
> theory, which means you need to know calculus, which means you need to
> know trigonometry, which means you need to know algrebra....  I could
> publish "Popular Mechanics" plans for such a gadget that a kid could
> build, but they wouldn't know how to do it themselves.
> 
> The front end seems to be a lot like that.  It took me weeks to figure out
> the relationship between tree.h and tree.def and GENERIC tags and the
> build_ routines and...  After I built my GENERIC dumper, I spent many days
> drawing the directed cyclic graphs of functions, starting with "void
> foo(void){}", to figure out what the middle end expected from me.  And I
> build up from there.  I have routines that do the hard work, so much so
> that I rarely work with individual GENERIC tags any more.  (I personally
> call individual trees "tags" when they are in isolation, and "nodes" when
> they are part of a tree, because otherwise the word "tree" gets so
> overworked it becomes meaningless).  I have macro-like routines that I
> have created to do the work.  I suspect every front end does, too.
> 
> And the thought of trying to document that in a way that's more meaningful
> than a do-it-yourself "Popular Mechanics" project plan "GCC Front Ends For
> Dummies" is exhausting.  How far down do you go?  ("First, find a deposit
> of iron ore, and a seam of coal.  Then, use a pile of rocks to build a
> forge...")
> 
> I'll give it some thought, though.  I would have found an "hello, world"
> front end incredibly useful.
> 
> > > COBOL sections and paragraphs (a section is a group of paragraphs) are
> > > conceptually similar to C functions. Given a paragraph named FOO, you
> > > can PERFORM FOO and the group of sentences (yes, a paragraph is made
> > > up of sentences; I remind you that COBOL was originally designed to be
> > > readable by non-programmers) are executed and control then returns to
> > > the statement after the PERFORM.
> > >
> > > For various reasons, execution into, through, and back from sections
> > > and paragraphs must be implemented with GOTO statements, and cannot be
> > > implemented with calls.
> >
> > Uh, that's awkward (if not only for the fact that the big functions you
> > end up will be slow to compile).
> 
> I am not sure I was clear.  Those GOTO statements are implemented in the
> run-time executable, so the executable contents of a paragraph are laid
> down only once in the generated executable.  I am not jumping all over
> creation in the front end.  (<shudder> That's a horrid thought, isn't it?)
> 
> >
> > > I nonetheless attempted, at one point, to implement PERFORM via calls,
> > > and the ".global" you noticed is a vestige of that effort.  The
> > > routine it was used in has a boolean variable 'global' that defaults
> > > to false, and the routine was never called with that parameter set to
> > > true.  It is unnecessary and has been deleted.
> > >
> > > gg_insert_into_assembler() does indeed use ASM_EXPR. I sometimes use
> > > it to generate #-delimited comments into the generated assembly
> > > language so that I can see what's going on.
> > >
> > > But I also use it to generate labeled locations in the executables.
> > >
> > > I am also developing a GDB-COBOL version of GDB, one that understands
> > > the executables GCOBOL is generating.  We need the GDB NEXT
> > > instruction to execute through a PROC that is the subject of a PERFORM
> > PROC statement.
> > > And I need to be able to set a breakpoint with "(gdb)break PROC".
> > >
> > > I have not yet figured out how to use GCC to put information into the
> > > .debug_info section of a DWARF executable, and I have not yet figured
> > > out how, in GDB, to extract such information.  So, for now, I am
> > > creating labels with meta-information in them.
> >
> > Hmm, I see.  I think that not using functions and function calls and
> thus
> > not having a "frame" will make this quite difficult.  Can you elaborate
> on
> > why having PARAGRAPH mapped to a function does not work?
> > I guess what I'd try is to have SECTION map to a function and have
> > PARAGRAPH map to nested functions within the SECTION - that way the
> > PARAGRAPHs have access to the variables at SECTION scope as they are at
> > the point of PERFORM (aka function calls).  In GNU C you can do this:
> >
> > void foo_section (int p)
> > {
> >   int a[10];
> >   int res;
> >   int i;
> >
> >   i = 5;
> >
> >   void bar_paragraph ()
> >   {
> >     res = a[i];
> >   }
> >
> >   bar_paragraph ();
> >   if (res == 7)
> >     return;
> >
> >   res += 5;
> >
> >   void foobar_paragraph ()
> >   {
> >     a[1] = res;
> >   }
> >
> >   foobar_paragraph ();
> >   bar_paragraph ();
> > }
> >
> > I _think_ that this might map to how COBOL works?  gdb can deal with
> > nested functions.  You can also have goto inbetween the section and
> > paragraph in case the language allows this - GCC supports "nonlocal"
> goto
> > for this case.
> >
> 
> Richard, your questions are good.  You are recapitulating my own research
> into these questions.  I looked into implementing paragraphs and sections
> as nested functions, and as standalone static functions.  I even
> implemented them as static functions at one point, but I quickly got
> tangled up in the realities of COBOL.
> 
> Some COBOL 101 is needed to understand why I came to the conclusion that
> PERFORM-as-CALL can't be done.
> 
> This is a working COBOL program:
> 
> 01        identification      division.
> 02        program-id.         prog.
> 03        data                division.
> 04        working-storage     section.
> 05        77 msg pic X(64).
> 06        procedure           division.
> 07        move "First time through" to msg
> 08        perform para-foo
> 09        move "Second time through" to msg.
> 10        para-foo.
> 11        display msg.
> 12        end-para-foo.
> 13        display "That's all, folks!"
> 14        goback.
> 15        end program         prog.
> 
> The output:
> 
>       First time through
>       Second time through
>       That's all, folks!
> 
> The "perform para-foo" at line 08 transfers control to line 11, which
> displays "msg".
> 
> The new paragraph at line 12 causes execution to return back to line 09,
> which updates "msg".
> 
> Execution then falls through to paragraph para-foo at line 10, so line 11
> gets executed again, displaying the updated "msg".
> 
> Execution then falls through to end-para-foo, but this time execution
> simply proceeds, since there is no "perform" statement involved.


Ick ;)  OK, I see how this is indeed not trivially mapped to functions.
OTOH the handling of end-para-foo in Cobol is odd in that it has to
know whether execution was from a PERFORM (it has to jump back) or
from fallthru.  That's probably also a bit awkward to implement with
the goto case, probably requiring some "flag".

Btw, is recursion allowed?

  f = 1
  fact.
  f = f * n
  n = n - 1
  if n != 0 perform fact
  end-fact.

or something like this to compute n!

A way to make the fallthru work would be to duplicate the statements
at the fallthru location (aka inline them at parsing) and only have
PERFORM invoke the function.

> Worse:
> 
> You can have a bunch of paragraphs in a row: para1, para2, ...para10.
> 
> You can execute "perform para3 through para6".  It's legal, in para4, to
> put in a "GO TO some-other-paragraph", where that other paragraph can be
> anywhere in the enclosing program-id/function.  If, eventually, control
> gets transferred back to para6, the "return" code at the end of para6 gets
> executed to return execution to the point after the "perform".  But it's
> not required.  Consider, if your brain can handle it, calling a function
> that doesn't return.

Interesting.  I do wonder if there's a way to manipulate the "return 
stack" or how it was thought such a thing would be implemented?  Kind
of like with a "goto with parameter", aka pass a return label along to
the goto target?

> Oh, yeah, right.  I forgot to mention that in addition to being the target
> of a "perform", and being able to be executed by falling-through into it,
> a paragraph can also be the target of a COBOL GO TO statement.
> 
> Even though I can dimly visualize creating convoluted logic for
> accomplishing some of that, I simply have no idea how to go about
> implementing jumping from the middle of one function into the middle of
> another function.

It only works as long as there's a common containing function, so ...

> I do however, have all that working using jumps.

... a COBOL TU is parsed into a single function then?  Thus it just
has a "main"?  I do understand there's nothing like a multi-file
COBOL program (guess there's only a single stack of punch cards to
feed the machine - heh)

> [snipped discussion of GTY(())]
> 
> > > As we have implemented it, COBOL variables do not have "types" as GCC
> > > uses the term.  A COBOL variable is a 112-byte cblc_field_t structure.
> > > It contains an enum that specifies the COBOL type (alphanumeric,
> > > floating-point, packed-decimal, numeric display (where numbers are
> > > stored as character strings of digits, e.g. "123"), binary, among
> other
> > things).
> > > The structure also specifies the number of bytes, the number of
> > > digits, the number of digits to the right of a fixed-point decimal
> > > point.  There is a 64-bit integer value full of attribute flags:
> > > whether or not the value is signable, big-endian, little-endian.
> >
> > Aha, so COBOL isn't statically typed?
> 
> Actually, I think COBOL is possibly the ultimate in "statically typed".
> 
> More COBOL 101:
> 
> These data definitions:
> 
> 77 var1 PICTURE 99V999 USAGE BINARY VALUE 12.345.
> 77 var1 PICTURE 99V999 USAGE DISPLAY VALUE 12.345.
> 77 var1 PICTURE 99V999 USAGE PACKED-DECIMAL VALUE 12.345.
> 77 var1 PICTURE 99V999 USAGE COMP-5 VALUE 12.345.
> 77 var1 PICTURE 99V999 USAGE FLOAT-LONG VALUE 12.345.
> 77 var1 PICTURE 99.999 VALUE 12.345.
> 
> All specify the same value -- 12.345 -- but in different ways.
> Respectively:
> 
> A four-byte big-endian unsigned binary value
> The character string "12345"
> A three-byte big-endian packed decimal value with a final 0xF nybble
> indicating an unsigned value
> A four-byte little-endian unsigned binary value
> An IEEE 754 binary64 value
> The character string "12.345".  (Take a deep breath and look up COBOL
> NUMERIC EDITED values.  If you dare.)
> 
> Those definitions are fixed and unchanging and known to the compiler.
> 
> The reason I carry all the metadata at runtime is because of the need for
> a debugger.  I emphasize again: I understand that I should take all of
> that metadata and put it into the .debug_info section.  But I don't yet
> know how to put custom information into .debug_info, nor do I know how to
> pull such information out of .debug_info using code in GDB.  All in good
> time.

Yeah.  It's of help to read the DWARF specification (it's not too big),
and in the end think of DWARF as a way to handle C with extensions for
other languages.  So my approach would be to try mapping COBOL type
concepts to C type concepts - iff that's not feasible then extensions
to DWARF for the COBOL type system are in order.  At least DWARF
knows DW_LANG_Cobol85, so it might be not a completely lost cause ;)
What kind of debug info do other compilers generate here?

> I am developing the companion gdb-cobol (
> https://gitlab.cobolworx.com/COBOLworx/gdb-cobol ), and I already have a
> number of variations of the print and ptype commands that have been
> adjusted for making sense in COBOL.
> 
> But, at the present time, for that to work I need all the metadata created
> by the COBOL data definitions.

Note there are a bunch of debug lanugage-hooks in GCC already, but none
take advantage of the fact that we now have only DWARF as debug
representation, so they generate "meta-data" which then dwarf2out.cc
interprets (like the get_array_descr_info hook implemented for Fortran).
That said, the COBOL frontend should be able to output/amend the DWARF DIE
for a type directly, like for example add a DW_AT_GNU_COBOL_type_data
attribute with the data you put to the runtime for the debugger placed
in a DWARF type attribute instead (as temporary(?) extension).

> >
> > > A true loony -- I am loony only when it's really useful -- would
> > > embark upon a quixotic religious quest to expand tree.def to encompass
> > > all that COBOL crap, so that we would have
> > > DEFTREECODE(COBOL_PACKED_DECIMAL_TYPE,
> > > "cobol_packed_decimal_type", tcc_type, 0) and go through the effort of
> > > having that TYPE be defined with the appropriate attributes, and then
> > > everything I've done would get absorbed into the middle-end which
> > > would spit out the assembly language to implement the language.
> > >
> > > Anybody who would want to do it that way should be thrown into a pit.
> > > Anybody who would want to let them do it that way should be thrown in,
> > > too.
> > >
> > > In general, most of the work in the executable is being done in the
> > > libgcobol.so code, written in C++.  When somebody wants to add a
> > > six-character value represented as a string of EBCDIC numerals with
> > > two digits to the right of a virtual decimal point, to a big-endian
> > > binary value that has four significant digits with and implied
> > > multiplier of 1000, and assign the result to a 12-digit packed decimal
> > > number with three digits to the right of an implied decimal point,
> > > there is an __gg__add() routine in the libgcobol.so that does all
> that.
> > >
> > > Welcome to COBOL.  Everything I just described there is four lines of
> > > code:
> > > 77 A PIC  999V99  USAGE DISPLAY.
> > > 77 B PIC 9999PPP  USAGE BINARY
> > > 77 C PIC 9(9)V999 USAGE PACKED-DECIMAL.
> > > ADD A TO B GIVING C
> > >
> > > So, a lot of what I do in the generated GENERIC is pass pointers to
> > > those cblc_field_t structures around.
> >
> > OK, I see.  So in theory it would be nice for the compiler to "optimize"
> > those into CPU native "types"?  It might be interesting to see whether
> > using QUAL_UNION_TYPE would be a good interal representation - GCCs
> types
> > can have TYPE_SIZE being an expression dependent on fields of the type,
> > like the qualifier of a union.  While I know much the details mostly
> > escape me here - Eric Botcazou of Ada Core would be likely the best
> go-to
> > target for brain-storming.
> 
> Thanks for the references.  I am actually content with how things are, for
> the most part.  I am trying to do simple stuff myself with GENERIC that I
> generate.  More complex stuff I do with calls into libgcobol.so, where I
> am using C/C++ to implement the complexities.  "Simple" means that I have
> a belief that the overhead of a call is undesireable, whereas "more
> complex" means that I think the overhead of the call/return is overwhelmed
> by the rest of the calculation.
> 
> I understand that this undercuts the ability of the compiler to optimize.
> I don't believe this to be significant problem.
> 
> >
> > > But!  There is a lot of work that I don't want to do in the library
> > > for performance reasons.  For example, if somebody defines a four-byte
> > > little-endian binary value that gets used as a loop counter, I don't
> > > want to be calling a library routine to decrement it, and another
> > > library routine to check if it's zero.
> >
> > Right!
> >
> > > So, I am generating GENERIC in a way that I regard as writing assembly
> > > language to handle some of the internals, like subtracting a numeric
> > > literal one from a four-byte little-endian binary.  I also hand-code
> > > stuff like calculating offsets into tables from the table subscripts.
> > > I didn't want to put that into library code because critical inner
> > > loops often have table subscripts and I don't want the overhead.
> > >
> > > The upshot is that some benchmark routines, that are heavily
> > > inner-loop dependent, run two to three times faster when compiled with
> > > GCOBOL than when compiled with some others.
> > >
> > > With that all said: Much of the code you are referencing here isn't
> > > used directly in implementing user COBOL source.  It is rather used by
> > > me in generating the "assembly language" to do things like array
> > > offset calculations.
> > >
> > > In any case, and given that gg_get_larger_type() is used by me in my
> > > code on variables I define, rather than on user code on variables the
> > > user defines, I believe it is doing what I need it to do.  For
> > > example, when I need to multiply tree A by tree B, I need them to both
> > > be of the same type in order to keep the TRUNC_DIV_EXPR from
> > > complaining that they aren't the same type.  So, I use
> > > gg_get_larger_type(A, B) to return the type of the one with the larger
> > TYPE_SIZE, and then I cast both A and B to that type.
> > >
> > > At least, that's what I think I am doing.  In my pile of index cards
> > > is the one that says, "Eliminate gg_get_larger_type(), and then fix
> > > all the resulting errors by defining the variables more sensibly in
> > > the first place."
> >
> > Thanks for explaining.  I wonder if we can populate a minimalistic
> dejagnu
> > testsuite to have some coverage and COBOL examples people could
> cut&paste
> > and experiment with to both see the IL generated and what optimizers can
> > (or can not) do with it.
> 
> https://gitlab.cobolworx.com/COBOLworx/gcc-cobol "cobol-stage" branch that
> Jim has been using as a base for creating patches has the
> ./gcc/cobol/tests directory (which is a make-based test suite), the
> ./gcc/cobol/nist directory (also a make-based test suite), and
> ./gcc/cobol/UAT (which is an autom4te-based test suite).
> 
> The first contains 125 cobol programs; the second contains 831.  The UAT
> test suite has .at files containing 1038 COBOL modules implementing 674
> separate tests.
> 
> At the present time, Jim and I are in a standoff. Did you ever have a
> roommate and get locked in a passive-aggressive battle where you are both
> adding dirty dishes to the kitchen sink, each waiting for the other to
> acknowledge that they are the cause of most of the pile and start washing?
> That's where we are with dejagnu.  We both know we're going to have to
> convert.  Neither of us is ready to start filling the sink with hot water.
> 
> [snipped]
> 
> > > Thank you ever so much for your comments and advice.  It feels a
> > > little like coming out from the cold.
> >
> > You're welcome - and sorry for the slow reply to your reply, holiday
> > season ...
> >
> > Richard.
> 
> Thank you, again.  I am enjoying this conversation; I hope it's helpful
> for understanding what we are doing.  And we are taking suggestions to
> heart.  Things are seemingly slow right now because Jim is converting
> error and warning messages to the diagnostic formats.  It's a little
> tedious.

That one at least will pay off to users!

Thanks again, I never thought I would even try to understand how
COBOL works ;)

Richard.

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH] COBOL 3/8 gen: GENERIC interface

Reply via email to