RE: [PATCH] COBOL 3/8 gen: GENERIC interface

Robert Dubner Mon, 20 Jan 2025 10:13:24 -0800

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Friday, January 10, 2025 02:43
> To: Robert Dubner <rdub...@symas.com>
> Cc: jklow...@symas.com; Joseph Myers <josmy...@redhat.com>; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [PATCH] COBOL 3/8 gen: GENERIC interface
> 
[massive snip.  Snip?  Nay: amputation!]
> > > I _think_ that this might map to how COBOL works?  gdb can deal with
> > > nested functions.  You can also have goto inbetween the section and
> > > paragraph in case the language allows this - GCC supports "nonlocal"
> > goto
> > > for this case.
> > >
> >
> > Richard, your questions are good.  You are recapitulating my own
> > research into these questions.  I looked into implementing paragraphs
> > and sections as nested functions, and as standalone static functions.
> > I even implemented them as static functions at one point, but I
> > quickly got tangled up in the realities of COBOL.
> >
> > Some COBOL 101 is needed to understand why I came to the conclusion
> > that PERFORM-as-CALL can't be done.
> >
> > This is a working COBOL program:
> >
> > 01        identification      division.
> > 02        program-id.         prog.
> > 03        data                division.
> > 04        working-storage     section.
> > 05        77 msg pic X(64).
> > 06        procedure           division.
> > 07        move "First time through" to msg
> > 08        perform para-foo
> > 09        move "Second time through" to msg.
> > 10        para-foo.
> > 11        display msg.
> > 12        end-para-foo.
> > 13        display "That's all, folks!"
> > 14        goback.
> > 15        end program         prog.
> >
> > The output:
> >
> >     First time through
> >     Second time through
> >     That's all, folks!
> >
> > The "perform para-foo" at line 08 transfers control to line 11, which
> > displays "msg".
> >
> > The new paragraph at line 12 causes execution to return back to line
> > 09, which updates "msg".
> >
> > Execution then falls through to paragraph para-foo at line 10, so line
> > 11 gets executed again, displaying the updated "msg".
> >
> > Execution then falls through to end-para-foo, but this time execution
> > simply proceeds, since there is no "perform" statement involved.
> 
> Ick ;)  OK, I see how this is indeed not trivially mapped to functions.
> OTOH the handling of end-para-foo in Cobol is odd in that it has to know
> whether execution was from a PERFORM (it has to jump back) or from
> fallthru.  That's probably also a bit awkward to implement with the goto
> case, probably requiring some "flag".
> 
> Btw, is recursion allowed?
> 
>   f = 1
>   fact.
>   f = f * n
>   n = n - 1
>   if n != 0 perform fact
>   end-fact.
> 
> or something like this to compute n!
> 
> A way to make the fallthru work would be to duplicate the statements at
> the fallthru location (aka inline them at parsing) and only have PERFORM
> invoke the function.

You have a talent for targeting the heart of the matter.

It's not so much that recursion is allowed.  It's more that it is not
disallowed.

For the record:  I have implemented the COBOL "program-id" as a func_decl
that gets called using call_expr.  Jim and I are operating under the
philosophy that from the standpoint of external linkage, a COBOL
"program-id" is identical to a C "function".  We have taken pains to
ensure that a C program can call a COBOL program-id using C syntax, and a
COBOL program can call functions in C programs using COBOL syntax.   

Sections and paragraphs are purely local to their parent
program-id/function.  (Yes, COBOL draws a distinction between "function"
and "program-id", just like the FORTRAN I learned back when FORTRAN was
spelled with all capital letters drew a distinction between "subroutine"
and "function".  This is what happens when you let electrical engineers
design software.)

And so, it is very clear to me that COBOL, originally developed in a day
when a bit was a vacuum tube the size of your thumb, and before anybody
was willing to "waste" expensive memory on a stack, even if they knew what
a stack was, implemented "PERFORM <paragraph>" through the use of
self-modifying code.

If I am right, every paragraph effectively ended with a NOP.  When a
PERFORM was executed, the "PERFORM" code would replace the NOP with a JMP
<return address>.  At the end of a paragraph, code there would execute the
"JMP <return address> which would return execution to a place that would
replace the "JMP <return address>" with the NOP again.

I am pretty sure I am essentially correct.  Evidence for my theory is
found by reading between the lines of even the latest COBOL specification:

""""
12.9.28.4 2) The results of executing the following sequence of PERFORM
statements are undefined and no exception condition is set to exist when
the sequence is executed:

   a) a PERFORM statement is executed and has not yet terminated, then
   b) within the range of that PERFORM statement another PERFORM statement
is executed, then
   c) the execution of the second PERFORM statement passes through the
exit of the first PERFORM statement.

NOTE 1 On some implementations it causes stack overflows, on some it
causes returns to unlikely places, and on other implementations other
actions can occur. Therefore, the results are unpredictable and are
unlikely to be portable.
""""

Said another way, your example creates code that results, according to the
COBOL standard, in "undefined" behavior.

GnuCOBOL chose to implement that undefined behavior so that your code
example works.  I chose to do the same.  I almost didn't; the "All hope
abandon, ye who enter here" mode would have produced more efficient code.
But a modern programmer expects recursion to work, and we decided not to
fight that battle. 

I don't know how GnuCOBOL does it, but at each PERFORM I push a
signature/return-location pair onto a std::vector.  At the end of each
paragraph, I check that vector to see if the signature of the currently
ending paragraph is at the back of the vector.  If so, I pop the stack and
jump to the return-location; otherwise I just fall through.

In that way, your example would work when compiled with GnuCOBOL or
GCOBOL.  I don't know what happens with other compilers.

> 
> > Worse:
> >
> > You can have a bunch of paragraphs in a row: para1, para2, ...para10.
> >
> > You can execute "perform para3 through para6".  It's legal, in para4,
> > to put in a "GO TO some-other-paragraph", where that other paragraph
> > can be anywhere in the enclosing program-id/function.  If, eventually,
> > control gets transferred back to para6, the "return" code at the end
> > of para6 gets executed to return execution to the point after the
> > "perform".  But it's not required.  Consider, if your brain can handle
> > it, calling a function that doesn't return.
> 
> Interesting.  I do wonder if there's a way to manipulate the "return
> stack" or how it was thought such a thing would be implemented?  Kind of
> like with a "goto with parameter", aka pass a return label along to the
> goto target?
> 
> > Oh, yeah, right.  I forgot to mention that in addition to being the
> > target of a "perform", and being able to be executed by
> > falling-through into it, a paragraph can also be the target of a COBOL
> GO TO statement.
> >
> > Even though I can dimly visualize creating convoluted logic for
> > accomplishing some of that, I simply have no idea how to go about
> > implementing jumping from the middle of one function into the middle
> > of another function.
> 
> It only works as long as there's a common containing function, so ...
> 
> > I do however, have all that working using jumps.
> 
> ... a COBOL TU is parsed into a single function then?  Thus it just has
a
> "main"?  I do understand there's nothing like a multi-file COBOL program
> (guess there's only a single stack of punch cards to feed the machine -
> heh)

<laughter> My pop, back in the 1620 -- or was it the 1401? -- days, was
one of those engineers who kept a copy of the binary for the Fortran
compiler on punch cards in his shirt pocket. So, you'd feed the compiler
binary into the machine, then your source code files, then the binary for
your program would get punched out onto more punched cards, and then you'd
feed them back into the machine. Typically there wasn't room in memory for
both the compiler binary and your program's binary, hence punch-and-reload
process.  But, be assured:  We have moved on.

I hope I made this clear up above:  As for a C executable, the code for a
COBOL executable can have many text units.  Each text unit can have
multiple program-id statements, each producing its own func_decl.  Those
are global.  You can have nested program-id statements.  I implement them
as static local functions, and the parser keeps track of who is allowed to
call who.  (I made a stab at true nested functions, but I never figured
out the correct series of calls into the GCC code to implement them, and
ultimately it didn't matter, so I gave up.) Each func_decl has multiple
sections and paragraphs, which are all local to the program-id, and which
have been implemented with jmp instructions as described here.

The upshot is that with regard to external structure, GCOBOL is just like
C.  Many .cbl files can be compiled into many .o files defining many
external global functions, and those can be linked together with .o files
compiled from C/C++ code.

As a side note: Modern C requires that there be a global main() entry
point.  COBOL has no such requirement; when you compile one or more text
units, by default execution is expected to start at the first program-id
encountered during compilation. We have expanded GCC with '-main' and
'-nomain' options giving the user fine control over which program-id in
which file is the beginning of execution. By default, however, I create a
main() entry point function that calls the first program-id in the source
code file.


> >
> > The reason I carry all the metadata at runtime is because of the need
> > for a debugger.  I emphasize again: I understand that I should take
> > all of that metadata and put it into the .debug_info section.  But I
> > don't yet know how to put custom information into .debug_info, nor do
> > I know how to pull such information out of .debug_info using code in
> > GDB.  All in good time.
> 
> Yeah.  It's of help to read the DWARF specification (it's not too big),
> and in the end think of DWARF as a way to handle C with extensions for
> other languages.  So my approach would be to try mapping COBOL type
> concepts to C type concepts - iff that's not feasible then extensions to
> DWARF for the COBOL type system are in order.  At least DWARF knows
> DW_LANG_Cobol85, so it might be not a completely lost cause ;) What kind
> of debug info do other compilers generate here?
> 
> > I am developing the companion gdb-cobol (
> > https://gitlab.cobolworx.com/COBOLworx/gdb-cobol ), and I already have
> > a number of variations of the print and ptype commands that have been
> > adjusted for making sense in COBOL.
> >
> > But, at the present time, for that to work I need all the metadata
> > created by the COBOL data definitions.
> 
> Note there are a bunch of debug lanugage-hooks in GCC already, but none
> take advantage of the fact that we now have only DWARF as debug
> representation, so they generate "meta-data" which then dwarf2out.cc
> interprets (like the get_array_descr_info hook implemented for Fortran).
> That said, the COBOL frontend should be able to output/amend the DWARF
DIE
> for a type directly, like for example add a DW_AT_GNU_COBOL_type_data
> attribute with the data you put to the runtime for the debugger placed
in
> a DWARF type attribute instead (as temporary(?) extension).
> 

There will come a day when I dig into this.  I once worked extensively
with the DWARF specifications; some years ago my first attempt at building
a Python extension to GDB to handle GnuCOBOL executables required -- Lord
help me -- actually modifying the DWARF debug information in the ELF
executable before handing it to GDB.  That has since been discarded; the
GnuCOBOL developers started providing me with the necessary information in
other forms, so I no longer had to modify the DWARF.  (You will, I trust,
gently bang your head on your keyboard when I tell you that much of the
necessary information is going into comments in the C code that GnuCOBOL
generates, and which I later scan.)

But right now the focus is Getting It Working.  Once we get GCOBOL and the
companion GDB-COBOL more or less stabilized, I'll start the effort of
moving the debug information to where it belongs. I am, be assured, making
a careful note of the guidance you just gave me.

> 
> Thanks again, I never thought I would even try to understand how COBOL
> works ;)

Thank you!  

> 
> Richard.
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
Nuernberg)
RE: [PATCH] COBOL 3/8 gen: GENERIC interface

Reply via email to