On Thu, 9 Jan 2025, Robert Dubner wrote: > I am going to trim back some of the older stuff. > > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Tuesday, January 7, 2025 08:32 > > To: Robert Dubner <rdub...@symas.com> > > Cc: jklow...@symas.com; Joseph Myers <josmy...@redhat.com>; gcc- > > patc...@gcc.gnu.org > > Subject: RE: [PATCH] COBOL 3/8 gen: GENERIC interface > > > > On Mon, 23 Dec 2024, Robert Dubner wrote: > > > > > Richard, a bunch of things you address are in my bailwick. > > > > > > When Jim and I set out to create a COBOL front end, I knew *NOTHING* > > > about, well, anything vis-à-vis GCC. I barely knew how it worked. > > > > I guess that's expected - we always hope people doing new frontends have > > spare time left to fill gaps in documentation with knowledge they gained > > ;) Or maybe write a blog post about how to do a new GCC frontend (there > > are multiple such for backends). But I know time is scarce. > > > > I don't think it's just time. There are so many layers. By analogy: I am > imagining a simple machine that levitates a steel ball bearing with an > electromagnet above and an optical sensor: sensor sees the bearing is too > low, so more current is sent to the coil, which raises the bearing and so > on. Somebody wants to know how to build such a thing. It's simple. You > just need to know how to build a sensor, and how to wind an electromagnet, > and build an amplifier, and you need to know physics and feedback control > theory, which means you need to know calculus, which means you need to > know trigonometry, which means you need to know algrebra.... I could > publish "Popular Mechanics" plans for such a gadget that a kid could > build, but they wouldn't know how to do it themselves. > > The front end seems to be a lot like that. It took me weeks to figure out > the relationship between tree.h and tree.def and GENERIC tags and the > build_ routines and... After I built my GENERIC dumper, I spent many days > drawing the directed cyclic graphs of functions, starting with "void > foo(void){}", to figure out what the middle end expected from me. And I > build up from there. I have routines that do the hard work, so much so > that I rarely work with individual GENERIC tags any more. (I personally > call individual trees "tags" when they are in isolation, and "nodes" when > they are part of a tree, because otherwise the word "tree" gets so > overworked it becomes meaningless). I have macro-like routines that I > have created to do the work. I suspect every front end does, too. > > And the thought of trying to document that in a way that's more meaningful > than a do-it-yourself "Popular Mechanics" project plan "GCC Front Ends For > Dummies" is exhausting. How far down do you go? ("First, find a deposit > of iron ore, and a seam of coal. Then, use a pile of rocks to build a > forge...") > > I'll give it some thought, though. I would have found an "hello, world" > front end incredibly useful. > > > > COBOL sections and paragraphs (a section is a group of paragraphs) are > > > conceptually similar to C functions. Given a paragraph named FOO, you > > > can PERFORM FOO and the group of sentences (yes, a paragraph is made > > > up of sentences; I remind you that COBOL was originally designed to be > > > readable by non-programmers) are executed and control then returns to > > > the statement after the PERFORM. > > > > > > For various reasons, execution into, through, and back from sections > > > and paragraphs must be implemented with GOTO statements, and cannot be > > > implemented with calls. > > > > Uh, that's awkward (if not only for the fact that the big functions you > > end up will be slow to compile). > > I am not sure I was clear. Those GOTO statements are implemented in the > run-time executable, so the executable contents of a paragraph are laid > down only once in the generated executable. I am not jumping all over > creation in the front end. (<shudder> That's a horrid thought, isn't it?) > > > > > > I nonetheless attempted, at one point, to implement PERFORM via calls, > > > and the ".global" you noticed is a vestige of that effort. The > > > routine it was used in has a boolean variable 'global' that defaults > > > to false, and the routine was never called with that parameter set to > > > true. It is unnecessary and has been deleted. > > > > > > gg_insert_into_assembler() does indeed use ASM_EXPR. I sometimes use > > > it to generate #-delimited comments into the generated assembly > > > language so that I can see what's going on. > > > > > > But I also use it to generate labeled locations in the executables. > > > > > > I am also developing a GDB-COBOL version of GDB, one that understands > > > the executables GCOBOL is generating. We need the GDB NEXT > > > instruction to execute through a PROC that is the subject of a PERFORM > > PROC statement. > > > And I need to be able to set a breakpoint with "(gdb)break PROC". > > > > > > I have not yet figured out how to use GCC to put information into the > > > .debug_info section of a DWARF executable, and I have not yet figured > > > out how, in GDB, to extract such information. So, for now, I am > > > creating labels with meta-information in them. > > > > Hmm, I see. I think that not using functions and function calls and > thus > > not having a "frame" will make this quite difficult. Can you elaborate > on > > why having PARAGRAPH mapped to a function does not work? > > I guess what I'd try is to have SECTION map to a function and have > > PARAGRAPH map to nested functions within the SECTION - that way the > > PARAGRAPHs have access to the variables at SECTION scope as they are at > > the point of PERFORM (aka function calls). In GNU C you can do this: > > > > void foo_section (int p) > > { > > int a[10]; > > int res; > > int i; > > > > i = 5; > > > > void bar_paragraph () > > { > > res = a[i]; > > } > > > > bar_paragraph (); > > if (res == 7) > > return; > > > > res += 5; > > > > void foobar_paragraph () > > { > > a[1] = res; > > } > > > > foobar_paragraph (); > > bar_paragraph (); > > } > > > > I _think_ that this might map to how COBOL works? gdb can deal with > > nested functions. You can also have goto inbetween the section and > > paragraph in case the language allows this - GCC supports "nonlocal" > goto > > for this case. > > > > Richard, your questions are good. You are recapitulating my own research > into these questions. I looked into implementing paragraphs and sections > as nested functions, and as standalone static functions. I even > implemented them as static functions at one point, but I quickly got > tangled up in the realities of COBOL. > > Some COBOL 101 is needed to understand why I came to the conclusion that > PERFORM-as-CALL can't be done. > > This is a working COBOL program: > > 01 identification division. > 02 program-id. prog. > 03 data division. > 04 working-storage section. > 05 77 msg pic X(64). > 06 procedure division. > 07 move "First time through" to msg > 08 perform para-foo > 09 move "Second time through" to msg. > 10 para-foo. > 11 display msg. > 12 end-para-foo. > 13 display "That's all, folks!" > 14 goback. > 15 end program prog. > > The output: > > First time through > Second time through > That's all, folks! > > The "perform para-foo" at line 08 transfers control to line 11, which > displays "msg". > > The new paragraph at line 12 causes execution to return back to line 09, > which updates "msg". > > Execution then falls through to paragraph para-foo at line 10, so line 11 > gets executed again, displaying the updated "msg". > > Execution then falls through to end-para-foo, but this time execution > simply proceeds, since there is no "perform" statement involved.
Ick ;) OK, I see how this is indeed not trivially mapped to functions. OTOH the handling of end-para-foo in Cobol is odd in that it has to know whether execution was from a PERFORM (it has to jump back) or from fallthru. That's probably also a bit awkward to implement with the goto case, probably requiring some "flag". Btw, is recursion allowed? f = 1 fact. f = f * n n = n - 1 if n != 0 perform fact end-fact. or something like this to compute n! A way to make the fallthru work would be to duplicate the statements at the fallthru location (aka inline them at parsing) and only have PERFORM invoke the function. > Worse: > > You can have a bunch of paragraphs in a row: para1, para2, ...para10. > > You can execute "perform para3 through para6". It's legal, in para4, to > put in a "GO TO some-other-paragraph", where that other paragraph can be > anywhere in the enclosing program-id/function. If, eventually, control > gets transferred back to para6, the "return" code at the end of para6 gets > executed to return execution to the point after the "perform". But it's > not required. Consider, if your brain can handle it, calling a function > that doesn't return. Interesting. I do wonder if there's a way to manipulate the "return stack" or how it was thought such a thing would be implemented? Kind of like with a "goto with parameter", aka pass a return label along to the goto target? > Oh, yeah, right. I forgot to mention that in addition to being the target > of a "perform", and being able to be executed by falling-through into it, > a paragraph can also be the target of a COBOL GO TO statement. > > Even though I can dimly visualize creating convoluted logic for > accomplishing some of that, I simply have no idea how to go about > implementing jumping from the middle of one function into the middle of > another function. It only works as long as there's a common containing function, so ... > I do however, have all that working using jumps. ... a COBOL TU is parsed into a single function then? Thus it just has a "main"? I do understand there's nothing like a multi-file COBOL program (guess there's only a single stack of punch cards to feed the machine - heh) > [snipped discussion of GTY(())] > > > > As we have implemented it, COBOL variables do not have "types" as GCC > > > uses the term. A COBOL variable is a 112-byte cblc_field_t structure. > > > It contains an enum that specifies the COBOL type (alphanumeric, > > > floating-point, packed-decimal, numeric display (where numbers are > > > stored as character strings of digits, e.g. "123"), binary, among > other > > things). > > > The structure also specifies the number of bytes, the number of > > > digits, the number of digits to the right of a fixed-point decimal > > > point. There is a 64-bit integer value full of attribute flags: > > > whether or not the value is signable, big-endian, little-endian. > > > > Aha, so COBOL isn't statically typed? > > Actually, I think COBOL is possibly the ultimate in "statically typed". > > More COBOL 101: > > These data definitions: > > 77 var1 PICTURE 99V999 USAGE BINARY VALUE 12.345. > 77 var1 PICTURE 99V999 USAGE DISPLAY VALUE 12.345. > 77 var1 PICTURE 99V999 USAGE PACKED-DECIMAL VALUE 12.345. > 77 var1 PICTURE 99V999 USAGE COMP-5 VALUE 12.345. > 77 var1 PICTURE 99V999 USAGE FLOAT-LONG VALUE 12.345. > 77 var1 PICTURE 99.999 VALUE 12.345. > > All specify the same value -- 12.345 -- but in different ways. > Respectively: > > A four-byte big-endian unsigned binary value > The character string "12345" > A three-byte big-endian packed decimal value with a final 0xF nybble > indicating an unsigned value > A four-byte little-endian unsigned binary value > An IEEE 754 binary64 value > The character string "12.345". (Take a deep breath and look up COBOL > NUMERIC EDITED values. If you dare.) > > Those definitions are fixed and unchanging and known to the compiler. > > The reason I carry all the metadata at runtime is because of the need for > a debugger. I emphasize again: I understand that I should take all of > that metadata and put it into the .debug_info section. But I don't yet > know how to put custom information into .debug_info, nor do I know how to > pull such information out of .debug_info using code in GDB. All in good > time. Yeah. It's of help to read the DWARF specification (it's not too big), and in the end think of DWARF as a way to handle C with extensions for other languages. So my approach would be to try mapping COBOL type concepts to C type concepts - iff that's not feasible then extensions to DWARF for the COBOL type system are in order. At least DWARF knows DW_LANG_Cobol85, so it might be not a completely lost cause ;) What kind of debug info do other compilers generate here? > I am developing the companion gdb-cobol ( > https://gitlab.cobolworx.com/COBOLworx/gdb-cobol ), and I already have a > number of variations of the print and ptype commands that have been > adjusted for making sense in COBOL. > > But, at the present time, for that to work I need all the metadata created > by the COBOL data definitions. Note there are a bunch of debug lanugage-hooks in GCC already, but none take advantage of the fact that we now have only DWARF as debug representation, so they generate "meta-data" which then dwarf2out.cc interprets (like the get_array_descr_info hook implemented for Fortran). That said, the COBOL frontend should be able to output/amend the DWARF DIE for a type directly, like for example add a DW_AT_GNU_COBOL_type_data attribute with the data you put to the runtime for the debugger placed in a DWARF type attribute instead (as temporary(?) extension). > > > > > A true loony -- I am loony only when it's really useful -- would > > > embark upon a quixotic religious quest to expand tree.def to encompass > > > all that COBOL crap, so that we would have > > > DEFTREECODE(COBOL_PACKED_DECIMAL_TYPE, > > > "cobol_packed_decimal_type", tcc_type, 0) and go through the effort of > > > having that TYPE be defined with the appropriate attributes, and then > > > everything I've done would get absorbed into the middle-end which > > > would spit out the assembly language to implement the language. > > > > > > Anybody who would want to do it that way should be thrown into a pit. > > > Anybody who would want to let them do it that way should be thrown in, > > > too. > > > > > > In general, most of the work in the executable is being done in the > > > libgcobol.so code, written in C++. When somebody wants to add a > > > six-character value represented as a string of EBCDIC numerals with > > > two digits to the right of a virtual decimal point, to a big-endian > > > binary value that has four significant digits with and implied > > > multiplier of 1000, and assign the result to a 12-digit packed decimal > > > number with three digits to the right of an implied decimal point, > > > there is an __gg__add() routine in the libgcobol.so that does all > that. > > > > > > Welcome to COBOL. Everything I just described there is four lines of > > > code: > > > 77 A PIC 999V99 USAGE DISPLAY. > > > 77 B PIC 9999PPP USAGE BINARY > > > 77 C PIC 9(9)V999 USAGE PACKED-DECIMAL. > > > ADD A TO B GIVING C > > > > > > So, a lot of what I do in the generated GENERIC is pass pointers to > > > those cblc_field_t structures around. > > > > OK, I see. So in theory it would be nice for the compiler to "optimize" > > those into CPU native "types"? It might be interesting to see whether > > using QUAL_UNION_TYPE would be a good interal representation - GCCs > types > > can have TYPE_SIZE being an expression dependent on fields of the type, > > like the qualifier of a union. While I know much the details mostly > > escape me here - Eric Botcazou of Ada Core would be likely the best > go-to > > target for brain-storming. > > Thanks for the references. I am actually content with how things are, for > the most part. I am trying to do simple stuff myself with GENERIC that I > generate. More complex stuff I do with calls into libgcobol.so, where I > am using C/C++ to implement the complexities. "Simple" means that I have > a belief that the overhead of a call is undesireable, whereas "more > complex" means that I think the overhead of the call/return is overwhelmed > by the rest of the calculation. > > I understand that this undercuts the ability of the compiler to optimize. > I don't believe this to be significant problem. > > > > > > But! There is a lot of work that I don't want to do in the library > > > for performance reasons. For example, if somebody defines a four-byte > > > little-endian binary value that gets used as a loop counter, I don't > > > want to be calling a library routine to decrement it, and another > > > library routine to check if it's zero. > > > > Right! > > > > > So, I am generating GENERIC in a way that I regard as writing assembly > > > language to handle some of the internals, like subtracting a numeric > > > literal one from a four-byte little-endian binary. I also hand-code > > > stuff like calculating offsets into tables from the table subscripts. > > > I didn't want to put that into library code because critical inner > > > loops often have table subscripts and I don't want the overhead. > > > > > > The upshot is that some benchmark routines, that are heavily > > > inner-loop dependent, run two to three times faster when compiled with > > > GCOBOL than when compiled with some others. > > > > > > With that all said: Much of the code you are referencing here isn't > > > used directly in implementing user COBOL source. It is rather used by > > > me in generating the "assembly language" to do things like array > > > offset calculations. > > > > > > In any case, and given that gg_get_larger_type() is used by me in my > > > code on variables I define, rather than on user code on variables the > > > user defines, I believe it is doing what I need it to do. For > > > example, when I need to multiply tree A by tree B, I need them to both > > > be of the same type in order to keep the TRUNC_DIV_EXPR from > > > complaining that they aren't the same type. So, I use > > > gg_get_larger_type(A, B) to return the type of the one with the larger > > TYPE_SIZE, and then I cast both A and B to that type. > > > > > > At least, that's what I think I am doing. In my pile of index cards > > > is the one that says, "Eliminate gg_get_larger_type(), and then fix > > > all the resulting errors by defining the variables more sensibly in > > > the first place." > > > > Thanks for explaining. I wonder if we can populate a minimalistic > dejagnu > > testsuite to have some coverage and COBOL examples people could > cut&paste > > and experiment with to both see the IL generated and what optimizers can > > (or can not) do with it. > > https://gitlab.cobolworx.com/COBOLworx/gcc-cobol "cobol-stage" branch that > Jim has been using as a base for creating patches has the > ./gcc/cobol/tests directory (which is a make-based test suite), the > ./gcc/cobol/nist directory (also a make-based test suite), and > ./gcc/cobol/UAT (which is an autom4te-based test suite). > > The first contains 125 cobol programs; the second contains 831. The UAT > test suite has .at files containing 1038 COBOL modules implementing 674 > separate tests. > > At the present time, Jim and I are in a standoff. Did you ever have a > roommate and get locked in a passive-aggressive battle where you are both > adding dirty dishes to the kitchen sink, each waiting for the other to > acknowledge that they are the cause of most of the pile and start washing? > That's where we are with dejagnu. We both know we're going to have to > convert. Neither of us is ready to start filling the sink with hot water. > > [snipped] > > > > Thank you ever so much for your comments and advice. It feels a > > > little like coming out from the cold. > > > > You're welcome - and sorry for the slow reply to your reply, holiday > > season ... > > > > Richard. > > Thank you, again. I am enjoying this conversation; I hope it's helpful > for understanding what we are doing. And we are taking suggestions to > heart. Things are seemingly slow right now because Jim is converting > error and warning messages to the diagnostic formats. It's a little > tedious. That one at least will pay off to users! Thanks again, I never thought I would even try to understand how COBOL works ;) Richard. -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)