Richard, a bunch of things you address are in my bailwick. When Jim and I set out to create a COBOL front end, I knew *NOTHING* about, well, anything vis-à-vis GCC. I barely knew how it worked. Some things I had to figure out even before I knew how to figure anything outl notably, creating functions. Creating a function was practically the first thing I had to do, and I spent a bunch of time trapping through GCC compiling simple functions to devine how it might be done. By definition, I simply did not know what I was doing, and learned as I went.
So, if there are peculiarities, that's probably why. And I am going to eagerly go through this message, and others, to apply the knowledge you are conveying; I am in a far better position to understand what you are saying than I would have been three years ago. > -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Sunday, December 22, 2024 07:18 > To: jklow...@symas.com > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] COBOL 3/8 gen: GENERIC interface > > > (sorry for breaking threading, but quoting the whole mail made my MUA > unbearably slow) > > From 64bcb34e12371f61a8958645e1668e0ac2704391gen.patch 4 Oct 2024 12:01:22 > -0400 > From: "James K. Lowden" <jklow...@symas.com> > Date: Thu 12 Dec 2024 06:28:07 PM EST > Subject: [PATCH] Add 'cobol' to 4 files > > gcc/cobol/ChangeLog > * genapi.cc: New file. > * gengen.cc: New file. > * genmath.cc: New file. > * genutil.cc: New file. > > > I'm skimming over this part of the frontend, taking notes on general > things I notice, partly quoting parts of the patch > > +#include <err.h> > +#include <stdbool.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <ctype.h> > +#include <vector> > +#include <string> > +#include <unordered_map> > +#include <unordered_set> > +#include <chrono> > +#include <time.h> > +#include <math.h> > + > +#include "config.h" > +#include "system.h" > > all system headers need to be included by system.h, C++ includes should be > included by defining INCLUDE_UNORDERED_MAP and INCLUDE_UNORDERED_SET, etc. > - includes definitely have to come after including config.h I now see that many of the front ends seem to have their own lang-system.h that wraps the gcc/system.h. Under the assumption that the recently-added GO and RUST are doing that in an acceptable fashion, I will model that behavior. > > +//#define XXX do{gg_printf("LINE %d\n", build_int_cst_type(INT, > __LINE__), NULL_TREE);}while(0); > +#define XXX > ... > +bool bSHOW_PARSE = getenv("SHOW_PARSE"); > > there seems to be debugging wired in - at least the getenv call above > should probably be be behind a more global COBOL_DEBUG conditional? > There's a langhook for frontend initialization - that's a convenient place > to conditionally (re-)initialize bSHOW_PARSE (when you default it to > false). > > + // One strongly suspects, one does, that creating the constructor one > + // character at a time isn't the best possible way of accomplishing > + the // desired goal. But I lost interest in trying to find a better > way. > + TYPE_NAME(array_of_characters) = get_identifier("cblc_string"); tree > + constr = make_node(CONSTRUCTOR); > + TREE_TYPE(constr) = array_of_characters; > > indeed, you should be able to use build_string (strlen(var_contents)+1, > var_contents) and use that as DECL_INTIIAL (it builds a STRING_CST node). That worked, thank you. > > +tree > +gg_fprintf(tree fd, int nargs, const char *format_string, ...) > + { > ... > + tree stmt = build_call_array_loc (location_from_lineno(), > + INT, > + function, > + argc, > + args); > > instead of constructing args[] you should be able to directly call > build_call_valist () with the va_list. > In this case, I don't see how to use build_call_valist (). gg_fprintf() takes a format_string and 'nargs' following arguments. It then builds a call to sprintf (buffer, format_string, <those nargs variables); I don't see how to pass 'format string' to build_call_valist (). > You seem to, in function_decl_from_name or gg_get_function_address, build > function decls for externs, mainly from the C library. Elsewhere we are > using what GCC calls "built-ins" for this, see gcc/builtins.def and > tree.h:builtin_decl_explicit. Unfortunately most builtins and the used > types are to be initialized by frontends, luckily some are from the > middle-end, see build_common_builtin_nodes - that includes BUILT_IN_MEMCPY > and friends. For optimization purposes (that GCC knows that "memcpy" is > memcpy) I suggest you see to properly declare > BUILT_IN_* declarations - the Fortran frontend might be a good recipie to > copy from (see fortran/f95-lang.cc around where it includes > mathbuiltins.def). I will learn about builtins. > > + if( !field->var_decl_node ) > + { > + fprintf( stderr, > + "%s (type: %s) improperly has a NULL var_decl_node\n", > + field->name, > + cbl_field_type_str(field->type)); > + fprintf( stderr, > + "Probable cause: it was referenced without being > defined.\n"); > + exit(1); > > With GCC diagnostics you'd use internal_error (...) which then terminates > GCC. > > +static void > +assembler_label(const char *label, bool global = false) > + { > + // label has to be a valid label for the assembler > + > + // use the global switch for COBOL ENTRY > + > + const char global_text[] = ": .global "; > > there's target macros for how assemblers work, like ASM_OUTPUT_LABEL, but > it seems this function is about sections? I'm not sure how > gg_insert_into_assembler "works", but it seems to emit a ASM_EXPR into the > IL - in the section_label case in the toplevel? For portability I'd > strongly advise against the use of global assembler, you can always tell > the middle-end what you desire here, for sections you can use > set_decl_section_name for example. > > I'm not sure for what else you use ASM_EXPRs. Hoo, boy. We have a namespace collision. The COBOL language -- which, I remind you, got there first -- uses "SECTION" and "PARAGRAPH" to mean very specific things that are part of the language specification. So, my use of the word "section" has nothing to do with the sections in an ELF executable. COBOL sections and paragraphs (a section is a group of paragraphs) are conceptually similar to C functions. Given a paragraph named FOO, you can PERFORM FOO and the group of sentences (yes, a paragraph is made up of sentences; I remind you that COBOL was originally designed to be readable by non-programmers) are executed and control then returns to the statement after the PERFORM. For various reasons, execution into, through, and back from sections and paragraphs must be implemented with GOTO statements, and cannot be implemented with calls. I nonetheless attempted, at one point, to implement PERFORM via calls, and the ".global" you noticed is a vestige of that effort. The routine it was used in has a boolean variable 'global' that defaults to false, and the routine was never called with that parameter set to true. It is unnecessary and has been deleted. gg_insert_into_assembler() does indeed use ASM_EXPR. I sometimes use it to generate #-delimited comments into the generated assembly language so that I can see what's going on. But I also use it to generate labeled locations in the executables. I am also developing a GDB-COBOL version of GDB, one that understands the executables GCOBOL is generating. We need the GDB NEXT instruction to execute through a PROC that is the subject of a PERFORM PROC statement. And I need to be able to set a breakpoint with "(gdb)break PROC". I have not yet figured out how to use GCC to put information into the .debug_info section of a DWARF executable, and I have not yet figured out how, in GDB, to extract such information. So, for now, I am creating labels with meta-information in them. So, for example, for a program 'prog' with section 'mainsec' and paragraph 'graff', I use ASM_EXPR to create the label "_para.graff.mainsec.prog.0.79:" Thus, when a user enters "break graff" or "break graff of mainsec", my code in GDB knows what to search for in order to know where to set the breakpoint. I apologize for the lengthy description here. Sometimes, however, it is unavoidable. Sometimes, it is true, I do weird things out of ignorance. But sometimes I am doing weird things because COBOL is, well, (and certainly in the context of compilers and debuggers designed for C/C++), weird. > > I noticed that you use build1 and friends sometimes without a location, > elsewhere there's location_from_lineno (). Parts of GCC use the global > 'input_location' which you could set when sv_current_line_number changes? > GCC locations also track file (for C #include - not sure if relevant for > Cobol) and column. I will look into 'input_location'. > > +// These are globally useful constants > +tree char_nodes[256]; > + > +tree integer_minusone_node; > +tree integer_two_node; > +tree integer_eight_node; > +tree size_t_zero_node; > +tree int128_zero_node; > +tree int128_five_node; > +tree int128_ten_node; > > you might sooner or later run into garbage collector issues, either > because you don't collect (so garbage piles up) or because you get things > collected that are still used (because it's not marked). The above look > like they want to be marked GTY(()) (maybe they are in a header file). They are not marked GTY(()). I don't know what in tarnation GTY means, and I have never been able to figure it out. And I don't know how, or when, to do garbage collection. I am grateful for any hint. (I have been aware, as I have been working, that I have been ignoring things that are probably important. But I figured I could track down problems as they occurred, and if no problems showed up, well, as the old story goes, when it ain't rainin' the roof don't leak nohow. I couldn't wait to understand GCC completely before writing code.) > > + tree distance = build2( MULT_EXPR, > + SIZE_T, > + gg_cast(sizetype, offset), > + TYPE_SIZE_UNIT(element_type)); > > if offset is a constant then the above builts the multiplication > expression, if you use fold_build2 (...) it would be simplified down to a > constant early. With optimization it's going to be optimized anyway, but > as the multiplication isn't written by the user it might be tempting to > simplify it in the frontend. Brand new to me; I will look into it. > > +static tree > +gg_get_larger_type(tree A, tree B) > + { > + tree larger = TREE_TYPE(B); > + if( TYPE_SIZE(TREE_TYPE(A)) > TYPE_SIZE(TREE_TYPE(B)) ) > + { > + larger = TREE_TYPE(A); > > that doesn't work - TYPE_SIZE is a pointer to a tree. You can use > tree_int_cst_compare (TYPE_SIZE(TREE_TYPE(A)), > TYPE_SIZE(TREE_TYPE(B))) == 1 instead. > > Maybe you are looking at arithmetic precision instead, in which case > TYPE_PRECISION (TREE_TYPE (A)) > TYPE_PRECISION (TREE_TYPE (B)) would be > appropriate (for example GCC supports int:3 and int:5 but they'd have both > TYPE_SIZE of 8 but precision of 3 and 5). > > Maybe Cobol doesn't have signed and unsigned types, iff it does it looks > like gg_get_larger_type might care for TYPE_UNSIGNED as well somehow. Oi. Here come many words. I don't like answering with too many words, but sometimes answers are complicated and many words are required. As we have implemented it, COBOL variables do not have "types" as GCC uses the term. A COBOL variable is a 112-byte cblc_field_t structure. It contains an enum that specifies the COBOL type (alphanumeric, floating-point, packed-decimal, numeric display (where numbers are stored as character strings of digits, e.g. "123"), binary, among other things). The structure also specifies the number of bytes, the number of digits, the number of digits to the right of a fixed-point decimal point. There is a 64-bit integer value full of attribute flags: whether or not the value is signable, big-endian, little-endian. A true loony -- I am loony only when it's really useful -- would embark upon a quixotic religious quest to expand tree.def to encompass all that COBOL crap, so that we would have DEFTREECODE(COBOL_PACKED_DECIMAL_TYPE, "cobol_packed_decimal_type", tcc_type, 0) and go through the effort of having that TYPE be defined with the appropriate attributes, and then everything I've done would get absorbed into the middle-end which would spit out the assembly language to implement the language. Anybody who would want to do it that way should be thrown into a pit. Anybody who would want to let them do it that way should be thrown in, too. In general, most of the work in the executable is being done in the libgcobol.so code, written in C++. When somebody wants to add a six-character value represented as a string of EBCDIC numerals with two digits to the right of a virtual decimal point, to a big-endian binary value that has four significant digits with and implied multiplier of 1000, and assign the result to a 12-digit packed decimal number with three digits to the right of an implied decimal point, there is an __gg__add() routine in the libgcobol.so that does all that. Welcome to COBOL. Everything I just described there is four lines of code: 77 A PIC 999V99 USAGE DISPLAY. 77 B PIC 9999PPP USAGE BINARY 77 C PIC 9(9)V999 USAGE PACKED-DECIMAL. ADD A TO B GIVING C So, a lot of what I do in the generated GENERIC is pass pointers to those cblc_field_t structures around. But! There is a lot of work that I don't want to do in the library for performance reasons. For example, if somebody defines a four-byte little-endian binary value that gets used as a loop counter, I don't want to be calling a library routine to decrement it, and another library routine to check if it's zero. So, I am generating GENERIC in a way that I regard as writing assembly language to handle some of the internals, like subtracting a numeric literal one from a four-byte little-endian binary. I also hand-code stuff like calculating offsets into tables from the table subscripts. I didn't want to put that into library code because critical inner loops often have table subscripts and I don't want the overhead. The upshot is that some benchmark routines, that are heavily inner-loop dependent, run two to three times faster when compiled with GCOBOL than when compiled with some others. With that all said: Much of the code you are referencing here isn't used directly in implementing user COBOL source. It is rather used by me in generating the "assembly language" to do things like array offset calculations. In any case, and given that gg_get_larger_type() is used by me in my code on variables I define, rather than on user code on variables the user defines, I believe it is doing what I need it to do. For example, when I need to multiply tree A by tree B, I need them to both be of the same type in order to keep the TRUNC_DIV_EXPR from complaining that they aren't the same type. So, I use gg_get_larger_type(A, B) to return the type of the one with the larger TYPE_SIZE, and then I cast both A and B to that type. At least, that's what I think I am doing. In my pile of index cards is the one that says, "Eliminate gg_get_larger_type(), and then fix all the resulting errors by defining the variables more sensibly in the first place." > > +tree > +gg_build_logical_expression(tree operand_a, > + enum logop_t op, > + tree operand_b) > + { > + tree logical_expression = NULL_TREE; > + tree_code logical_op; > + switch(op) > + { > + case and_op: > + logical_op = TRUTH_ANDIF_EXPR; > ... > + case xor_op: > + logical_op = TRUTH_XOR_EXPR; > > note TRUTH_ANDIF_EXPR and TRUTH_XOR_EXPR behave differently with respect > to evaluation of side-effects of the second operand. There's no > TRUTH_XORIF_EXPR, iff XORIF semantics are desired you'd have to lower that > explicitly (or ask for the middle-end to adopt a TRUTH_XORIF_EXPR). > > +__int128 > +get_power_of_ten(int n) > + { > > I think it was already mentioned - you eventually want to use wide_int (or > a fixed_wide_int_storage specialization), or for truly arbitrary precision > integers use gmp (the Fortran frontend does this). The use of __int128 > restricts you to a subset of hosts the Cobol compiler can run on. I will look into wide_int. I have been reluctant to use gmp simply because of the increased complexity, a belief that, over time, the number of phone calls from users is directly proportional to the number of prerequisite libraries, and because it wasn't that many months ago that adding 1 to 2 and getting 3 was justification for breaking out the champagne. If gmp gets us more architectures without too much of a hit on performance, we'll switch over. > > > That's it for a skim through this large patch. I'm curious whether you > tried configuring with --enable-checking=yes (the default on trunk) and if > you tried to use LTO with Cobol yet? > I also wonder about the DWARF generated with -g In my debug builds, I have "--enable-checking", which I see is the same as "enable-checking=yes". In our release builds we have, "--enable-checking=release". I have not tried to use LTO. The DWARF is working nicely with the companion GDB-COBOL on code compiled with '-ggdb -O0". The GDB folks are, at best, dimly aware that I am rummaging through their code. I will use the occasion of a patch into GCC as the starting gun for begging their forgiveness and enlisting their aid incorporating my stuff. You can use ordinary GDB to set breakpoints on lines and then step through code created by GCOBOL. "print" won't work, however. In response to some other comments from Joseph Meyers, I will report here that I did some major surgery on the structure of the code. I have rigged things up so that libgcobol compiles on its own; it has no prerequisites in gcc/cobol. The gcc/cobol code does, of course, rely on libgcobol/ .h files. There are also two code modules that are compiled into both libgcobol and gcc/cobol. (There are some tables for generating messages, and a bunch of code for dealing with EBCDIC versus ASCII that's used both at compile time and at run time.) Those source code modules now live in libgcobol. The Make-lang.in file now makes copies of those two files and puts them into gcc/cobol. So, libgcobol/charmaps.cc gets copied as gcc/cobol/charmaps-copy.cc; line 1, a //comment line, gets replaced with "// DO NOT EDIT THIS FILE. It was copied from the libgcobol directory." What this does to Jim's patching process, I don't know. But I don't know how to respond to input from you experts without changing the code. The patching process will just have to catch up, I guess. Thank you ever so much for your comments and advice. It feels a little like coming out from the cold. Bob Dubner rdub...@symas.com > > Thanks, > Richard.