On Wed, 2024-09-11 at 20:49 -0500, tcpreimesber...@gmail.com wrote:
> From: Thor C Preimesberger <tcpreimesber...@gmail.com>
> 
> This patch allows the compiler to dump GENERIC trees as JSON objects.
> 
> The dump flag -fdump-tree-original-json dumps each fndecl node in the
> C frontend's gimplifier as a JSON object and traverses related nodes 
> in an analagous manner as to raw-dumping.

Thanks for posting this patch.

Are you able to upload somewhere some examples of what the dumps look
like?

Some high level thoughts:

* the patch uses "dummy" throughout as a variable name.  To me the name
"dummy" suggests something unimportant that we had to give a name to,
or something that we'd prefer didn't exist but had to create.  However
in most(all?) cases "dummy" seems to refer to the json object being
created or having properties added to it, and thus the most interesting
thing in the function.  I suspect that renaming "dummy" to "js_obj" or
"json_obj" throughout would be an improvement in readability in terms
of capturing the intent of the code (assuming that all of them are
indeed json objects).

* I think the code is leaking memory for all of the json values created
- there are lots of uses of "naked new" in this code, but I don't see
any uses of "delete".  For example, in 

> +void
> +dump_node_json (const_tree t, dump_flags_t flags, FILE *stream)
> +{
> +  struct dump_info di;
> +  dump_queue_p dq;
> +  dump_queue_p next_dq;
> +  pretty_printer pp;
> +  /* Initialize the dump-information structure.  */
> +  di.stream = stream;
> +  di.index = 0;
> +  di.column = 0;
> +  di.queue = 0;
> +  di.queue_end = 0;
> +  di.free_list = 0;
> +  di.flags = flags;
> +  di.node = t;
> +  di.nodes = splay_tree_new (splay_tree_compare_pointers, 0,
> +                          splay_tree_delete_pointers);
> +  di.json_dump = new json::array ();
     ^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^^
     allocated with naked new here

> +  /* Queue up the first node.  */
> +  queue (&di, t);
> +
> +  /* Until the queue is empty, keep dumping nodes.  */
> +  while (di.queue)
> +    dequeue_and_dump (&di);
> +
> +  di.json_dump->dump(stream, true);
> +  fputs("\n", stream);
> +  /* Now, clean up.  */
> +  for (dq = di.free_list; dq; dq = next_dq)
> +    {
> +      next_dq = dq->next;
> +      free (dq);
> +    }
> +  splay_tree_delete (di.nodes);

and di.json_dump goes out of scope here and is leaked, I think.  So I
*think* all of the json values being created during dumping are being
leaked.

> +}

Similarly, in:

> +DEBUG_FUNCTION void
> +debug_tree_json (tree t)
> +{
> +  json::object* _x = node_emit_json(t);
> +  _x->dump(stderr, true);
> +  fprintf(stderr, "\n");
> +}

if I'm reading things right, node_emit_json doesn't "emit" json so much
as create a new json::object on the heap via "new", and when "_x" goes
out of scope, it's leaked.

The pattern in the code seems to be that node_emit_json creates a new
json::object and populates it with properties (sometimes recursively).

Given that, and that we can use C++11, I recommend using
std::unique_ptr<json::object> for it, to capture the intent that this
is a heap-allocated pointer with responsibility for being "delete"-d at
some point.

That way, rather that:

  json::object* 
  node_emit_json(tree t)
  {
    tree op0, op1, type;
    enum tree_code code;
    expanded_location xloc;
    json::object *dummy;
    json::array* holder;
    char address_buffer[sizeof(&t)] = {"\0"};
  
    dummy = new json::object ();
    holder = new json::array ();

    [...snip...]

    return dummy;
  }


we could have (perhaps renaming to "node_to_json"):

  std::unique_ptr<json::object>
  node_to_json(tree t)
  {
    tree op0, op1, type;
    enum tree_code code;
    expanded_location xloc;
    char address_buffer[sizeof(&t)] = {"\0"};
  
    auto js_obj = ::make_unique<json::object> (); // to implicitly use 
std::unique_ptr<json::object>
    auto holder = ::make_unique<json::array> ();  // likewise 
std::unique_ptr<json::array>

    [...snip...]

    return js_obj;
  }

...assuming that I'm correctly understanding the ownership of the json
values in the patch.  Note that we have to use ::make_unique from our
make-unique.h, rather than std::make_unique from <memory> since the
latter was only added in C++14.

Many of our data structures don't properly handle objects with
destructors, and I suspect splay_tree is one of these.  You can use
js_obj.release () to transfer ownership to such data structures, and
will (probably) need to manually use "delete" on the pointers in the
right places.

What happens to "holder" in that function?  It seems to get populated
with json objects for the various tree nodes found recursively, but
then it seems to simply be leaked (or populated then auto-deleted, if
we use std::unique_ptr>.  Or am I missing something?

In case it's helpful, a couple of months ago I converted the SARIF
output code from using "naked" json pointers to using std::unique_ptr
in:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658204.html
and I found it helped a *lot* with documenting ownership and avoiding
leaks.  You might find other things of interest in the first half of
this patch kit:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658194.html
FWIW in that code I'm also using a class hierarchy of json::object
subclasses to help with compliance with a JSON schema (sarif_object is
a subclass of json::object), but given that there probably isn't a
schema for the JSON in TDF_JSON dumps, that's probably not relevant to
this case.

FWIW you can test for leaks by running the compiler under valgrind by
configuring with --enable-valgrind-annotations and appending "-wrapper
valgrind" to the command line (or "-wrapper valgrind,--leak-check=full"
IIRC).

Thanks again for the patch; hope this is constructive.
Dave

Reply via email to