(sorry for breaking threading, but quoting the whole mail made my MUA
unbearably slow)

>From 64bcb34e12371f61a8958645e1668e0ac2704391gen.patch 4 Oct 2024 12:01:22 
-0400
From: "James K. Lowden" <jklow...@symas.com>
Date: Thu 12 Dec 2024 06:28:07 PM EST
Subject: [PATCH]  Add 'cobol' to 4 files

gcc/cobol/ChangeLog
        * genapi.cc: New file.
        * gengen.cc: New file.
        * genmath.cc: New file.
        * genutil.cc: New file.


I'm skimming over this part of the frontend, taking notes on general
things I notice, partly quoting parts of the patch

+#include <err.h> 
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <ctype.h>
+#include <vector>
+#include <string>
+#include <unordered_map>
+#include <unordered_set>
+#include <chrono>
+#include <time.h>
+#include <math.h>
+
+#include "config.h"
+#include "system.h"

all system headers need to be included by system.h, C++ includes should
be included by defining INCLUDE_UNORDERED_MAP and INCLUDE_UNORDERED_SET, 
etc. - includes definitely have to come after including config.h

+//#define XXX do{gg_printf("LINE %d\n", build_int_cst_type(INT, 
__LINE__), NULL_TREE);}while(0);
+#define XXX
...
+bool bSHOW_PARSE = getenv("SHOW_PARSE"); 

there seems to be debugging wired in - at least the getenv call above 
should probably be be behind a more global COBOL_DEBUG conditional?
There's a langhook for frontend initialization - that's a convenient
place to conditionally (re-)initialize bSHOW_PARSE (when you default
it to false).

+  // One strongly suspects, one does, that creating the constructor one
+  // character at a time isn't the best possible way of accomplishing the
+  // desired goal. But I lost interest in trying to find a better way.
+  TYPE_NAME(array_of_characters) = get_identifier("cblc_string");
+  tree constr = make_node(CONSTRUCTOR);
+  TREE_TYPE(constr) = array_of_characters;

indeed, you should be able to use build_string (strlen(var_contents)+1, 
var_contents) and use that as DECL_INTIIAL (it builds a STRING_CST node).

+tree
+gg_fprintf(tree fd, int nargs, const char *format_string, ...)
+  { 
...
+  tree stmt = build_call_array_loc (location_from_lineno(),
+                                    INT,
+                                    function,
+                                    argc,
+                                    args);

instead of constructing args[] you should be able to directly
call build_call_valist () with the va_list.

You seem to, in function_decl_from_name or gg_get_function_address,
build function decls for externs, mainly from the C library.  Elsewhere
we are using what GCC calls "built-ins" for this, see gcc/builtins.def
and tree.h:builtin_decl_explicit.  Unfortunately most builtins and
the used types are to be initialized by frontends, luckily some
are from the middle-end, see build_common_builtin_nodes - that includes
BUILT_IN_MEMCPY and friends.  For optimization purposes (that GCC knows
that "memcpy" is memcpy) I suggest you see to properly declare
BUILT_IN_* declarations - the Fortran frontend might be a good
recipie to copy from (see fortran/f95-lang.cc around where it
includes mathbuiltins.def).

+    if( !field->var_decl_node )
+      {
+      fprintf(  stderr,
+                "%s (type: %s) improperly has a NULL var_decl_node\n",
+                field->name,
+                cbl_field_type_str(field->type));
+      fprintf(  stderr,
+                "Probable cause: it was referenced without being 
defined.\n");
+      exit(1);

With GCC diagnostics you'd use internal_error (...) which then
terminates GCC.

+static void
+assembler_label(const char *label, bool global = false)
+  {
+  // label has to be a valid label for the assembler
+  
+  // use the global switch for COBOL ENTRY
+  
+  const char global_text[] = ": .global ";

there's target macros for how assemblers work, like ASM_OUTPUT_LABEL,
but it seems this function is about sections?  I'm not sure how
gg_insert_into_assembler "works", but it seems to emit a ASM_EXPR
into the IL - in the section_label case in the toplevel?  For
portability I'd strongly advise against the use of global assembler,
you can always tell the middle-end what you desire here, for
sections you can use set_decl_section_name for example.

I'm not sure for what else you use ASM_EXPRs.

I noticed that you use build1 and friends sometimes without a
location, elsewhere there's location_from_lineno ().  Parts of
GCC use the global 'input_location' which you could set when
sv_current_line_number changes?  GCC locations also track file
(for C #include - not sure if relevant for Cobol) and column.

+// These are globally useful constants
+tree char_nodes[256];
+
+tree integer_minusone_node;
+tree integer_two_node;
+tree integer_eight_node;
+tree size_t_zero_node;
+tree int128_zero_node;
+tree int128_five_node;
+tree int128_ten_node;

you might sooner or later run into garbage collector issues, either
because you don't collect (so garbage piles up) or because you get
things collected that are still used (because it's not marked).  The
above look like they want to be marked GTY(()) (maybe they are in
a header file).

+    tree distance = build2( MULT_EXPR,
+                            SIZE_T,
+                            gg_cast(sizetype, offset),
+                            TYPE_SIZE_UNIT(element_type));

if offset is a constant then the above builts the multiplication 
expression, if you use fold_build2 (...) it would be simplified
down to a constant early.  With optimization it's going to be
optimized anyway, but as the multiplication isn't written by
the user it might be tempting to simplify it in the frontend.

+static tree
+gg_get_larger_type(tree A, tree B)
+  {
+  tree larger = TREE_TYPE(B);
+  if( TYPE_SIZE(TREE_TYPE(A)) > TYPE_SIZE(TREE_TYPE(B)) )
+    {
+    larger = TREE_TYPE(A);

that doesn't work - TYPE_SIZE is a pointer to a tree.  You can
use tree_int_cst_compare (TYPE_SIZE(TREE_TYPE(A)), 
TYPE_SIZE(TREE_TYPE(B))) == 1 instead.

Maybe you are looking at arithmetic precision instead, in which
case TYPE_PRECISION (TREE_TYPE (A)) > TYPE_PRECISION (TREE_TYPE (B))
would be appropriate (for example GCC supports int:3 and int:5 but
they'd have both TYPE_SIZE of 8 but precision of 3 and 5).

Maybe Cobol doesn't have signed and unsigned types, iff it does
it looks like gg_get_larger_type might care for TYPE_UNSIGNED as well
somehow.

+tree
+gg_build_logical_expression(tree operand_a,
+                            enum logop_t op,
+                            tree operand_b)
+  {
+  tree logical_expression = NULL_TREE;
+  tree_code logical_op;
+  switch(op)
+    {
+    case and_op:
+      logical_op = TRUTH_ANDIF_EXPR;
...
+    case xor_op:
+      logical_op = TRUTH_XOR_EXPR;

note TRUTH_ANDIF_EXPR and TRUTH_XOR_EXPR behave differently with respect
to evaluation of side-effects of the second operand.  There's no
TRUTH_XORIF_EXPR, iff XORIF semantics are desired you'd have to lower
that explicitly (or ask for the middle-end to adopt a TRUTH_XORIF_EXPR).

+__int128
+get_power_of_ten(int n)
+  {

I think it was already mentioned - you eventually want to use
wide_int (or a fixed_wide_int_storage specialization), or for
truly arbitrary precision integers use gmp (the Fortran frontend
does this).  The use of __int128 restricts you to a subset of
hosts the Cobol compiler can run on.


That's it for a skim through this large patch.  I'm curious
whether you tried configuring with --enable-checking=yes (the
default on trunk) and if you tried to use LTO with Cobol yet?
I also wonder about the DWARF generated with -g

Thanks,
Richard.

Reply via email to