I am going to trim back some of the older stuff.

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Tuesday, January 7, 2025 08:32
> To: Robert Dubner <rdub...@symas.com>
> Cc: jklow...@symas.com; Joseph Myers <josmy...@redhat.com>; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [PATCH] COBOL 3/8 gen: GENERIC interface
>
> On Mon, 23 Dec 2024, Robert Dubner wrote:
>
> > Richard, a bunch of things you address are in my bailwick.
> >
> > When Jim and I set out to create a COBOL front end, I knew *NOTHING*
> > about, well, anything vis-à-vis GCC.  I barely knew how it worked.
>
> I guess that's expected - we always hope people doing new frontends have
> spare time left to fill gaps in documentation with knowledge they gained
> ;)  Or maybe write a blog post about how to do a new GCC frontend (there
> are multiple such for backends).  But I know time is scarce.
>

I don't think it's just time.  There are so many layers.  By analogy: I am
imagining a simple machine that levitates a steel ball bearing with an
electromagnet above and an optical sensor: sensor sees the bearing is too
low, so more current is sent to the coil, which raises the bearing and so
on.  Somebody wants to know how to build such a thing.  It's simple.  You
just need to know how to build a sensor, and how to wind an electromagnet,
and build an amplifier, and you need to know physics and feedback control
theory, which means you need to know calculus, which means you need to
know trigonometry, which means you need to know algrebra....  I could
publish "Popular Mechanics" plans for such a gadget that a kid could
build, but they wouldn't know how to do it themselves.

The front end seems to be a lot like that.  It took me weeks to figure out
the relationship between tree.h and tree.def and GENERIC tags and the
build_ routines and...  After I built my GENERIC dumper, I spent many days
drawing the directed cyclic graphs of functions, starting with "void
foo(void){}", to figure out what the middle end expected from me.  And I
build up from there.  I have routines that do the hard work, so much so
that I rarely work with individual GENERIC tags any more.  (I personally
call individual trees "tags" when they are in isolation, and "nodes" when
they are part of a tree, because otherwise the word "tree" gets so
overworked it becomes meaningless).  I have macro-like routines that I
have created to do the work.  I suspect every front end does, too.

And the thought of trying to document that in a way that's more meaningful
than a do-it-yourself "Popular Mechanics" project plan "GCC Front Ends For
Dummies" is exhausting.  How far down do you go?  ("First, find a deposit
of iron ore, and a seam of coal.  Then, use a pile of rocks to build a
forge...")

I'll give it some thought, though.  I would have found an "hello, world"
front end incredibly useful.

> > COBOL sections and paragraphs (a section is a group of paragraphs) are
> > conceptually similar to C functions. Given a paragraph named FOO, you
> > can PERFORM FOO and the group of sentences (yes, a paragraph is made
> > up of sentences; I remind you that COBOL was originally designed to be
> > readable by non-programmers) are executed and control then returns to
> > the statement after the PERFORM.
> >
> > For various reasons, execution into, through, and back from sections
> > and paragraphs must be implemented with GOTO statements, and cannot be
> > implemented with calls.
>
> Uh, that's awkward (if not only for the fact that the big functions you
> end up will be slow to compile).

I am not sure I was clear.  Those GOTO statements are implemented in the
run-time executable, so the executable contents of a paragraph are laid
down only once in the generated executable.  I am not jumping all over
creation in the front end.  (<shudder> That's a horrid thought, isn't it?)

>
> > I nonetheless attempted, at one point, to implement PERFORM via calls,
> > and the ".global" you noticed is a vestige of that effort.  The
> > routine it was used in has a boolean variable 'global' that defaults
> > to false, and the routine was never called with that parameter set to
> > true.  It is unnecessary and has been deleted.
> >
> > gg_insert_into_assembler() does indeed use ASM_EXPR. I sometimes use
> > it to generate #-delimited comments into the generated assembly
> > language so that I can see what's going on.
> >
> > But I also use it to generate labeled locations in the executables.
> >
> > I am also developing a GDB-COBOL version of GDB, one that understands
> > the executables GCOBOL is generating.  We need the GDB NEXT
> > instruction to execute through a PROC that is the subject of a PERFORM
> PROC statement.
> > And I need to be able to set a breakpoint with "(gdb)break PROC".
> >
> > I have not yet figured out how to use GCC to put information into the
> > .debug_info section of a DWARF executable, and I have not yet figured
> > out how, in GDB, to extract such information.  So, for now, I am
> > creating labels with meta-information in them.
>
> Hmm, I see.  I think that not using functions and function calls and
thus
> not having a "frame" will make this quite difficult.  Can you elaborate
on
> why having PARAGRAPH mapped to a function does not work?
> I guess what I'd try is to have SECTION map to a function and have
> PARAGRAPH map to nested functions within the SECTION - that way the
> PARAGRAPHs have access to the variables at SECTION scope as they are at
> the point of PERFORM (aka function calls).  In GNU C you can do this:
>
> void foo_section (int p)
> {
>   int a[10];
>   int res;
>   int i;
>
>   i = 5;
>
>   void bar_paragraph ()
>   {
>     res = a[i];
>   }
>
>   bar_paragraph ();
>   if (res == 7)
>     return;
>
>   res += 5;
>
>   void foobar_paragraph ()
>   {
>     a[1] = res;
>   }
>
>   foobar_paragraph ();
>   bar_paragraph ();
> }
>
> I _think_ that this might map to how COBOL works?  gdb can deal with
> nested functions.  You can also have goto inbetween the section and
> paragraph in case the language allows this - GCC supports "nonlocal"
goto
> for this case.
>

Richard, your questions are good.  You are recapitulating my own research
into these questions.  I looked into implementing paragraphs and sections
as nested functions, and as standalone static functions.  I even
implemented them as static functions at one point, but I quickly got
tangled up in the realities of COBOL.

Some COBOL 101 is needed to understand why I came to the conclusion that
PERFORM-as-CALL can't be done.

This is a working COBOL program:

01        identification      division.
02        program-id.         prog.
03        data                division.
04        working-storage     section.
05        77 msg pic X(64).
06        procedure           division.
07        move "First time through" to msg
08        perform para-foo
09        move "Second time through" to msg.
10        para-foo.
11        display msg.
12        end-para-foo.
13        display "That's all, folks!"
14        goback.
15        end program         prog.

The output:

        First time through
        Second time through
        That's all, folks!

The "perform para-foo" at line 08 transfers control to line 11, which
displays "msg".

The new paragraph at line 12 causes execution to return back to line 09,
which updates "msg".

Execution then falls through to paragraph para-foo at line 10, so line 11
gets executed again, displaying the updated "msg".

Execution then falls through to end-para-foo, but this time execution
simply proceeds, since there is no "perform" statement involved.

Worse:

You can have a bunch of paragraphs in a row: para1, para2, ...para10.

You can execute "perform para3 through para6".  It's legal, in para4, to
put in a "GO TO some-other-paragraph", where that other paragraph can be
anywhere in the enclosing program-id/function.  If, eventually, control
gets transferred back to para6, the "return" code at the end of para6 gets
executed to return execution to the point after the "perform".  But it's
not required.  Consider, if your brain can handle it, calling a function
that doesn't return.

Oh, yeah, right.  I forgot to mention that in addition to being the target
of a "perform", and being able to be executed by falling-through into it,
a paragraph can also be the target of a COBOL GO TO statement.

Even though I can dimly visualize creating convoluted logic for
accomplishing some of that, I simply have no idea how to go about
implementing jumping from the middle of one function into the middle of
another function.

I do however, have all that working using jumps.

[snipped discussion of GTY(())]

> > As we have implemented it, COBOL variables do not have "types" as GCC
> > uses the term.  A COBOL variable is a 112-byte cblc_field_t structure.
> > It contains an enum that specifies the COBOL type (alphanumeric,
> > floating-point, packed-decimal, numeric display (where numbers are
> > stored as character strings of digits, e.g. "123"), binary, among
other
> things).
> > The structure also specifies the number of bytes, the number of
> > digits, the number of digits to the right of a fixed-point decimal
> > point.  There is a 64-bit integer value full of attribute flags:
> > whether or not the value is signable, big-endian, little-endian.
>
> Aha, so COBOL isn't statically typed?

Actually, I think COBOL is possibly the ultimate in "statically typed".

More COBOL 101:

These data definitions:

77 var1 PICTURE 99V999 USAGE BINARY VALUE 12.345.
77 var1 PICTURE 99V999 USAGE DISPLAY VALUE 12.345.
77 var1 PICTURE 99V999 USAGE PACKED-DECIMAL VALUE 12.345.
77 var1 PICTURE 99V999 USAGE COMP-5 VALUE 12.345.
77 var1 PICTURE 99V999 USAGE FLOAT-LONG VALUE 12.345.
77 var1 PICTURE 99.999 VALUE 12.345.

All specify the same value -- 12.345 -- but in different ways.
Respectively:

A four-byte big-endian unsigned binary value
The character string "12345"
A three-byte big-endian packed decimal value with a final 0xF nybble
indicating an unsigned value
A four-byte little-endian unsigned binary value
An IEEE 754 binary64 value
The character string "12.345".  (Take a deep breath and look up COBOL
NUMERIC EDITED values.  If you dare.)

Those definitions are fixed and unchanging and known to the compiler.

The reason I carry all the metadata at runtime is because of the need for
a debugger.  I emphasize again: I understand that I should take all of
that metadata and put it into the .debug_info section.  But I don't yet
know how to put custom information into .debug_info, nor do I know how to
pull such information out of .debug_info using code in GDB.  All in good
time.

I am developing the companion gdb-cobol (
https://gitlab.cobolworx.com/COBOLworx/gdb-cobol ), and I already have a
number of variations of the print and ptype commands that have been
adjusted for making sense in COBOL.

But, at the present time, for that to work I need all the metadata created
by the COBOL data definitions.

>
> > A true loony -- I am loony only when it's really useful -- would
> > embark upon a quixotic religious quest to expand tree.def to encompass
> > all that COBOL crap, so that we would have
> > DEFTREECODE(COBOL_PACKED_DECIMAL_TYPE,
> > "cobol_packed_decimal_type", tcc_type, 0) and go through the effort of
> > having that TYPE be defined with the appropriate attributes, and then
> > everything I've done would get absorbed into the middle-end which
> > would spit out the assembly language to implement the language.
> >
> > Anybody who would want to do it that way should be thrown into a pit.
> > Anybody who would want to let them do it that way should be thrown in,
> > too.
> >
> > In general, most of the work in the executable is being done in the
> > libgcobol.so code, written in C++.  When somebody wants to add a
> > six-character value represented as a string of EBCDIC numerals with
> > two digits to the right of a virtual decimal point, to a big-endian
> > binary value that has four significant digits with and implied
> > multiplier of 1000, and assign the result to a 12-digit packed decimal
> > number with three digits to the right of an implied decimal point,
> > there is an __gg__add() routine in the libgcobol.so that does all
that.
> >
> > Welcome to COBOL.  Everything I just described there is four lines of
> > code:
> > 77 A PIC  999V99  USAGE DISPLAY.
> > 77 B PIC 9999PPP  USAGE BINARY
> > 77 C PIC 9(9)V999 USAGE PACKED-DECIMAL.
> > ADD A TO B GIVING C
> >
> > So, a lot of what I do in the generated GENERIC is pass pointers to
> > those cblc_field_t structures around.
>
> OK, I see.  So in theory it would be nice for the compiler to "optimize"
> those into CPU native "types"?  It might be interesting to see whether
> using QUAL_UNION_TYPE would be a good interal representation - GCCs
types
> can have TYPE_SIZE being an expression dependent on fields of the type,
> like the qualifier of a union.  While I know much the details mostly
> escape me here - Eric Botcazou of Ada Core would be likely the best
go-to
> target for brain-storming.

Thanks for the references.  I am actually content with how things are, for
the most part.  I am trying to do simple stuff myself with GENERIC that I
generate.  More complex stuff I do with calls into libgcobol.so, where I
am using C/C++ to implement the complexities.  "Simple" means that I have
a belief that the overhead of a call is undesireable, whereas "more
complex" means that I think the overhead of the call/return is overwhelmed
by the rest of the calculation.

I understand that this undercuts the ability of the compiler to optimize.
I don't believe this to be significant problem.

>
> > But!  There is a lot of work that I don't want to do in the library
> > for performance reasons.  For example, if somebody defines a four-byte
> > little-endian binary value that gets used as a loop counter, I don't
> > want to be calling a library routine to decrement it, and another
> > library routine to check if it's zero.
>
> Right!
>
> > So, I am generating GENERIC in a way that I regard as writing assembly
> > language to handle some of the internals, like subtracting a numeric
> > literal one from a four-byte little-endian binary.  I also hand-code
> > stuff like calculating offsets into tables from the table subscripts.
> > I didn't want to put that into library code because critical inner
> > loops often have table subscripts and I don't want the overhead.
> >
> > The upshot is that some benchmark routines, that are heavily
> > inner-loop dependent, run two to three times faster when compiled with
> > GCOBOL than when compiled with some others.
> >
> > With that all said: Much of the code you are referencing here isn't
> > used directly in implementing user COBOL source.  It is rather used by
> > me in generating the "assembly language" to do things like array
> > offset calculations.
> >
> > In any case, and given that gg_get_larger_type() is used by me in my
> > code on variables I define, rather than on user code on variables the
> > user defines, I believe it is doing what I need it to do.  For
> > example, when I need to multiply tree A by tree B, I need them to both
> > be of the same type in order to keep the TRUNC_DIV_EXPR from
> > complaining that they aren't the same type.  So, I use
> > gg_get_larger_type(A, B) to return the type of the one with the larger
> TYPE_SIZE, and then I cast both A and B to that type.
> >
> > At least, that's what I think I am doing.  In my pile of index cards
> > is the one that says, "Eliminate gg_get_larger_type(), and then fix
> > all the resulting errors by defining the variables more sensibly in
> > the first place."
>
> Thanks for explaining.  I wonder if we can populate a minimalistic
dejagnu
> testsuite to have some coverage and COBOL examples people could
cut&paste
> and experiment with to both see the IL generated and what optimizers can
> (or can not) do with it.

https://gitlab.cobolworx.com/COBOLworx/gcc-cobol "cobol-stage" branch that
Jim has been using as a base for creating patches has the
./gcc/cobol/tests directory (which is a make-based test suite), the
./gcc/cobol/nist directory (also a make-based test suite), and
./gcc/cobol/UAT (which is an autom4te-based test suite).

The first contains 125 cobol programs; the second contains 831.  The UAT
test suite has .at files containing 1038 COBOL modules implementing 674
separate tests.

At the present time, Jim and I are in a standoff. Did you ever have a
roommate and get locked in a passive-aggressive battle where you are both
adding dirty dishes to the kitchen sink, each waiting for the other to
acknowledge that they are the cause of most of the pile and start washing?
That's where we are with dejagnu.  We both know we're going to have to
convert.  Neither of us is ready to start filling the sink with hot water.

[snipped]

> > Thank you ever so much for your comments and advice.  It feels a
> > little like coming out from the cold.
>
> You're welcome - and sorry for the slow reply to your reply, holiday
> season ...
>
> Richard.

Thank you, again.  I am enjoying this conversation; I hope it's helpful
for understanding what we are doing.  And we are taking suggestions to
heart.  Things are seemingly slow right now because Jim is converting
error and warning messages to the diagnostic formats.  It's a little
tedious.

>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
Nuernberg)

Bob Dubner

Reply via email to