(sorry for breaking threading, but quoting the whole mail made my MUA unbearably slow)
>From 64bcb34e12371f61a8958645e1668e0ac2704391gen.patch 4 Oct 2024 12:01:22 -0400 From: "James K. Lowden" <jklow...@symas.com> Date: Thu 12 Dec 2024 06:28:07 PM EST Subject: [PATCH] Add 'cobol' to 4 files gcc/cobol/ChangeLog * genapi.cc: New file. * gengen.cc: New file. * genmath.cc: New file. * genutil.cc: New file. I'm skimming over this part of the frontend, taking notes on general things I notice, partly quoting parts of the patch +#include <err.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <ctype.h> +#include <vector> +#include <string> +#include <unordered_map> +#include <unordered_set> +#include <chrono> +#include <time.h> +#include <math.h> + +#include "config.h" +#include "system.h" all system headers need to be included by system.h, C++ includes should be included by defining INCLUDE_UNORDERED_MAP and INCLUDE_UNORDERED_SET, etc. - includes definitely have to come after including config.h +//#define XXX do{gg_printf("LINE %d\n", build_int_cst_type(INT, __LINE__), NULL_TREE);}while(0); +#define XXX ... +bool bSHOW_PARSE = getenv("SHOW_PARSE"); there seems to be debugging wired in - at least the getenv call above should probably be be behind a more global COBOL_DEBUG conditional? There's a langhook for frontend initialization - that's a convenient place to conditionally (re-)initialize bSHOW_PARSE (when you default it to false). + // One strongly suspects, one does, that creating the constructor one + // character at a time isn't the best possible way of accomplishing the + // desired goal. But I lost interest in trying to find a better way. + TYPE_NAME(array_of_characters) = get_identifier("cblc_string"); + tree constr = make_node(CONSTRUCTOR); + TREE_TYPE(constr) = array_of_characters; indeed, you should be able to use build_string (strlen(var_contents)+1, var_contents) and use that as DECL_INTIIAL (it builds a STRING_CST node). +tree +gg_fprintf(tree fd, int nargs, const char *format_string, ...) + { ... + tree stmt = build_call_array_loc (location_from_lineno(), + INT, + function, + argc, + args); instead of constructing args[] you should be able to directly call build_call_valist () with the va_list. You seem to, in function_decl_from_name or gg_get_function_address, build function decls for externs, mainly from the C library. Elsewhere we are using what GCC calls "built-ins" for this, see gcc/builtins.def and tree.h:builtin_decl_explicit. Unfortunately most builtins and the used types are to be initialized by frontends, luckily some are from the middle-end, see build_common_builtin_nodes - that includes BUILT_IN_MEMCPY and friends. For optimization purposes (that GCC knows that "memcpy" is memcpy) I suggest you see to properly declare BUILT_IN_* declarations - the Fortran frontend might be a good recipie to copy from (see fortran/f95-lang.cc around where it includes mathbuiltins.def). + if( !field->var_decl_node ) + { + fprintf( stderr, + "%s (type: %s) improperly has a NULL var_decl_node\n", + field->name, + cbl_field_type_str(field->type)); + fprintf( stderr, + "Probable cause: it was referenced without being defined.\n"); + exit(1); With GCC diagnostics you'd use internal_error (...) which then terminates GCC. +static void +assembler_label(const char *label, bool global = false) + { + // label has to be a valid label for the assembler + + // use the global switch for COBOL ENTRY + + const char global_text[] = ": .global "; there's target macros for how assemblers work, like ASM_OUTPUT_LABEL, but it seems this function is about sections? I'm not sure how gg_insert_into_assembler "works", but it seems to emit a ASM_EXPR into the IL - in the section_label case in the toplevel? For portability I'd strongly advise against the use of global assembler, you can always tell the middle-end what you desire here, for sections you can use set_decl_section_name for example. I'm not sure for what else you use ASM_EXPRs. I noticed that you use build1 and friends sometimes without a location, elsewhere there's location_from_lineno (). Parts of GCC use the global 'input_location' which you could set when sv_current_line_number changes? GCC locations also track file (for C #include - not sure if relevant for Cobol) and column. +// These are globally useful constants +tree char_nodes[256]; + +tree integer_minusone_node; +tree integer_two_node; +tree integer_eight_node; +tree size_t_zero_node; +tree int128_zero_node; +tree int128_five_node; +tree int128_ten_node; you might sooner or later run into garbage collector issues, either because you don't collect (so garbage piles up) or because you get things collected that are still used (because it's not marked). The above look like they want to be marked GTY(()) (maybe they are in a header file). + tree distance = build2( MULT_EXPR, + SIZE_T, + gg_cast(sizetype, offset), + TYPE_SIZE_UNIT(element_type)); if offset is a constant then the above builts the multiplication expression, if you use fold_build2 (...) it would be simplified down to a constant early. With optimization it's going to be optimized anyway, but as the multiplication isn't written by the user it might be tempting to simplify it in the frontend. +static tree +gg_get_larger_type(tree A, tree B) + { + tree larger = TREE_TYPE(B); + if( TYPE_SIZE(TREE_TYPE(A)) > TYPE_SIZE(TREE_TYPE(B)) ) + { + larger = TREE_TYPE(A); that doesn't work - TYPE_SIZE is a pointer to a tree. You can use tree_int_cst_compare (TYPE_SIZE(TREE_TYPE(A)), TYPE_SIZE(TREE_TYPE(B))) == 1 instead. Maybe you are looking at arithmetic precision instead, in which case TYPE_PRECISION (TREE_TYPE (A)) > TYPE_PRECISION (TREE_TYPE (B)) would be appropriate (for example GCC supports int:3 and int:5 but they'd have both TYPE_SIZE of 8 but precision of 3 and 5). Maybe Cobol doesn't have signed and unsigned types, iff it does it looks like gg_get_larger_type might care for TYPE_UNSIGNED as well somehow. +tree +gg_build_logical_expression(tree operand_a, + enum logop_t op, + tree operand_b) + { + tree logical_expression = NULL_TREE; + tree_code logical_op; + switch(op) + { + case and_op: + logical_op = TRUTH_ANDIF_EXPR; ... + case xor_op: + logical_op = TRUTH_XOR_EXPR; note TRUTH_ANDIF_EXPR and TRUTH_XOR_EXPR behave differently with respect to evaluation of side-effects of the second operand. There's no TRUTH_XORIF_EXPR, iff XORIF semantics are desired you'd have to lower that explicitly (or ask for the middle-end to adopt a TRUTH_XORIF_EXPR). +__int128 +get_power_of_ten(int n) + { I think it was already mentioned - you eventually want to use wide_int (or a fixed_wide_int_storage specialization), or for truly arbitrary precision integers use gmp (the Fortran frontend does this). The use of __int128 restricts you to a subset of hosts the Cobol compiler can run on. That's it for a skim through this large patch. I'm curious whether you tried configuring with --enable-checking=yes (the default on trunk) and if you tried to use LTO with Cobol yet? I also wonder about the DWARF generated with -g Thanks, Richard.