On Fri, Sep 13, 2024 at 8:32 AM Thor Preimesberger
<tcpreimesber...@gmail.com> wrote:
>
> > There are three oddities I immediately notice:
> >
> > The PLUS_EXPR operands are in a array "operands" while the RETURN_EXPR
> > "operand" or "child pointer" is refered to from "return_expr".  I think 
> > both are
> > tcc_expression trees and the operands are in exp.operands.  Ideally the
> > JSON would more closely reflect that rather than possibly following the 
> > "pretty"
> > printing logic.
>
> Ah - for the binary operator, operands may have been a poor choice of
> words there.
> There's an abstract nonsense definition that would not necessarily be
> reasonable here.
> Would it make more sense to dump it e.g. as
> .....
>    "bin_operator": "+",
>    "op0": {"addr": "0x7f8256bda360"
>                ...}
>    "op1": {"addr": ...}
> .....

I actually liked the array, so following using accessors it then would be

   "operand" : [ {...}, {...} ]

omitting tree_ (give we're dumping trees), if I got the idea right
that [] denotes an array of sub-objects in JSON.

That you have both

>                                      "tree code": "plus_expr",
>                                      "bin_operator": "+",

is a bit redundant, the 'bin_operator' would be something that's
the same for all nodes with 'tree code' : 'plus_expr'.  I would simply
omit that attribute.

> I think there are some parts of the code that I wrote that don't have
> the accessor used as their key when referring to a different node -
> e.g. case PLACEHOLDER_EXPR. Would this be an issue?

Well, it's just names - but yes, using the accessor is probably most useful
to developers as they are likely familiar with that.

> > While the tree_code of a tree node is the most important bit (maybe besides 
> > of
> > its address), the "tree code" attribute is after the locations (which
> > are also quite
> > verbose and distracting - at least when looking at raw JSON).  For locations
> > one could honor TDF_LINENO and only dump those when using
> > -fdump-tree-original-json-lineno.  I'd re-order "tree code" after "addr".
>
> Sounds good - I'll implement dumping locs iff TDF_LINENO is enabled.
>
> > The third issue is that above the tree node with address 0x7f8256a10c60
> > (and its children) appear twice - while you maintain a splay tree and assign
> > unique numbers the duplicate nodes are not output by reference?  I would
> > suggest to use { "ref_addr" : "0x7f8256a10c60" } for the output of such
> > reference for example.
> >
> > I'm not sure whether JSON allows different object kinds or if that's solely
> > done by having a special attribute if that's needed.  With the above
> > regular tree nodes would be "addr" and references be "ref_addr".  A 
> > recursive
> > JSON structure like above is OK to look at in RAW, I'm not sure whether
> > for automatic processing and for example translating to a graph a linear
> > collection of nodes with references would be easier to handle.
>
> I agree that it should be easier to process the JSON if the references have a
> different key. Should be easy to implement.

Yep, the important part is to avoid the duplicates in the JSON.

> > Few comments on the patch itself - the #include of tree-emit-json.h from
> > dumpfile.cc doesn't seem to be necessary.  Since you declare
> > dump_node_json in dumpfile.h it should be possible to elide the header
> > and put the contents into the tree-emit-json.cc file.
> >
> > Another #include is duplicated (and also looks unnecessary).
> >
>
> All fixed now on my working tree.
>
> > I know you have some crude JSON -> html translation script written in
> > python - can you share that as well?  I'd suggest to post separate from
> > this main patch, adding it to contrib/.
>
> Sure - let me get the fixes suggested in this email done since it'll
> change (and simplify) the logic a bit.
>
> > Can we solve the multi-function issue somehow?  I know we have some
> > abstraction for a dump file, we'd need a hook that's invoked on opening
> > and closing to emit a something there - I guess it would be even OK to
> > hard-code this into dumpfile.cc for the -JSON dump variant.  It might
> > be possible to register dump specific data with that object and get
> > to the "current" dump file in dump_node_json so the splay-tree could
> > be kept live and the allocations released on dump-file close?  Again,
> > two hard-coded hooks from dumpfile.cc at open/close time into
> > the JSON dumping for this might be feasible and track the global state
> > with global variables.  That's to allow references to global objects and
> > types streamed in a previous function context.
>
> If the multi-function issue is that the dump pass currently produces
> a series of JSON objects rather than a single one - I think what you're
> suggesting is essentially done by optrecord_json_writer, for
> -fsave-optimization-record. One approach I have in my head
> is for, let's call it a tree_json_writer, to hold a
> json::array, append each node we traverse, and then
> flush this array to the dumpfile at the end.

Yeah, that would work as well.

Thanks,
Richard.

> This would also enable a way to address what you brought
> up at the very end.
>
> (In the python script I have written up, I just call the bash command
> I posted in the first email to turn the output into a single JSON object.
> I don't expect that it's really possible to call sed from within gcc.)
>
> Best,
> Thor
>
> On Thu, Sep 12, 2024 at 7:14 AM Richard Biener
> <richard.guent...@gmail.com> wrote:
> >
> > On Thu, Sep 12, 2024 at 12:51 PM David Malcolm <dmalc...@redhat.com> wrote:
> > >
> > > On Wed, 2024-09-11 at 20:49 -0500, tcpreimesber...@gmail.com wrote:
> > > > From: Thor C Preimesberger <tcpreimesber...@gmail.com>
> > > >
> > > > This patch allows the compiler to dump GENERIC trees as JSON objects.
> > > >
> > > > The dump flag -fdump-tree-original-json dumps each fndecl node in the
> > > > C frontend's gimplifier as a JSON object and traverses related nodes
> > > > in an analagous manner as to raw-dumping.
> > >
> > > Thanks for posting this patch.
> > >
> > > Are you able to upload somewhere some examples of what the dumps look
> > > like?
> >
> > I found https://renhongl.github.io/json-editor/ which seems to accept
> > the output of -fdump-tree-original-json visualizes the raw JSON structure
> > when the input is from a single function.
> >
> > struct S { int i; int j; } s;
> > int bar ()
> > {
> >   return s.i + s.j;
> > }
> > int main()
> > {
> >   return bar ();
> > }
> >
> > no longer recognizes it, I would guess we'd need to produce an outer
> > "file level" JSON node.  Simply wrapping the file in [{ ... }] didn't
> > work even with comma separating two functions.
> >
> > The JSON for bar looks like (sorry for the long paste)
> >
> > [{"addr": "0x7f8256bda3c0",
> >   "expr_loc": [{"file": "t.c",
> >                 "line": 3,
> >                 "column": 1}],
> >   "start_loc": [{"file": "t.c",
> >                  "line": 3,
> >                  "column": 1}],
> >   "finish_loc": [{"file": "t.c",
> >                   "line": 3,
> >                   "column": 1}],
> >   "tree code": "bind_expr",
> >   "bind_expr_body": {"addr": "0x7f8256bab860",
> >                      "expr_loc": [{"file": "t.c",
> >                                    "line": 4,
> >                                    "column": 14}],
> >                      "start_loc": [{"file": "t.c",
> >                                     "line": 4,
> >                                     "column": 10}],
> >                      "finish_loc": [{"file": "t.c",
> >                                      "line": 4,
> >                                      "column": 18}],
> >                      "tree code": "return_expr",
> >                      "return_expr": {"addr": "0x7f8256bb1b40",
> >                                      "expr_loc": [{"file": "t.c",
> >                                                    "line": 4,
> >                                                    "column": 14}],
> >                                      "start_loc": [{"file": "t.c",
> >                                                     "line": 4,
> >                                                     "column": 10}],
> >                                      "finish_loc": [{"file": "t.c",
> >                                                      "line": 4,
> >                                                      "column": 18}],
> >                                      "tree code": "plus_expr",
> >                                      "bin_operator": "+",
> >                                      "operands": [{"addr": "0x7f8256bda360",
> >                                                    "expr_loc": [{"file": 
> > "t.c",
> >                                                                  "line": 4,
> >                                                                  "column": 
> > 11}],
> >                                                    "start_loc": [{"file": 
> > "t.c",
> >                                                                   "line": 4,
> >
> > "column": 10}],
> >                                                    "finish_loc":
> > [{"file": "t.c",
> >                                                                    "line": 
> > 4,
> >
> > "column": 12}],
> >                                                    "tree code": 
> > "component_ref",
> >                                                    "expr": {"addr":
> > "0x7f8256a10c60",
> >
> > "decl_loc": [{"file": "t.c",
> >
> >    "line": 1,
> >
> >    "column": 28}],
> >                                                             "tree
> > code": "var_decl",
> >                                                             "used": true,
> >                                                             "public": true,
> >                                                             " static": true,
> >                                                             "read": true,
> >                                                             "mode": "DI",
> >
> > "defer-output": true,
> >                                                             "id_to_locale": 
> > "s",
> >                                                             "id_point": 
> > "s"},
> >                                                    "field": {"addr":
> > "0x7f8256a30688",
> >
> > "decl_loc": [{"file": "t.c",
> >
> >     "line": 1,
> >
> >     "column": 16}],
> >                                                              "tree
> > code": "field_decl",
> >                                                              "mode": "SI",
> >
> > "id_to_locale": "i",
> >                                                              "id_point": 
> > "i"}},
> >                                                   {"addr": "0x7f8256bda390",
> >                                                    "expr_loc": [{"file": 
> > "t.c",
> >                                                                  "line": 4,
> >                                                                  "column": 
> > 17}],
> >                                                    "start_loc": [{"file": 
> > "t.c",
> >                                                                   "line": 4,
> >
> > "column": 16}],
> >                                                    "finish_loc":
> > [{"file": "t.c",
> >                                                                    "line": 
> > 4,
> >
> > "column": 18}],
> >                                                    "tree code": 
> > "component_ref",
> >                                                    "expr": {"addr":
> > "0x7f8256a10c60",
> >
> > "decl_loc": [{"file": "t.c",
> >
> >    "line": 1,
> >
> >    "column": 28}],
> >                                                             "tree
> > code": "var_decl",
> >                                                             "used": true,
> >                                                             "public": true,
> >                                                             " static": true,
> >                                                             "read": true,
> >                                                             "mode": "DI",
> >
> > "defer-output": true,
> >                                                             "id_to_locale": 
> > "s",
> >                                                             "id_point": 
> > "s"},
> >                                                    "field": {"addr":
> > "0x7f8256a30720",
> >
> > "decl_loc": [{"file": "t.c",
> >
> >     "line": 1,
> >
> >     "column": 23}],
> >                                                              "tree
> > code": "field_decl",
> >                                                              "mode": "SI",
> >
> > "id_to_locale": "j",
> >
> > "id_point": "j"}}]}}}]
> >
> > There are three oddities I immediately notice:
> >
> > The PLUS_EXPR operands are in a array "operands" while the RETURN_EXPR
> > "operand" or "child pointer" is refered to from "return_expr".  I think 
> > both are
> > tcc_expression trees and the operands are in exp.operands.  Ideally the
> > JSON would more closely reflect that rather than possibly following the 
> > "pretty"
> > printing logic.
> >
> > While the tree_code of a tree node is the most important bit (maybe besides 
> > of
> > its address), the "tree code" attribute is after the locations (which
> > are also quite
> > verbose and distracting - at least when looking at raw JSON).  For locations
> > one could honor TDF_LINENO and only dump those when using
> > -fdump-tree-original-json-lineno.  I'd re-order "tree code" after "addr".
> >
> > The third issue is that above the tree node with address 0x7f8256a10c60
> > (and its children) appear twice - while you maintain a splay tree and assign
> > unique numbers the duplicate nodes are not output by reference?  I would
> > suggest to use { "ref_addr" : "0x7f8256a10c60" } for the output of such
> > reference for example.
> >
> > I'm not sure whether JSON allows different object kinds or if that's solely
> > done by having a special attribute if that's needed.  With the above
> > regular tree nodes would be "addr" and references be "ref_addr".  A 
> > recursive
> > JSON structure like above is OK to look at in RAW, I'm not sure whether
> > for automatic processing and for example translating to a graph a linear
> > collection of nodes with references would be easier to handle.
> >
> >
> > Few comments on the patch itself - the #include of tree-emit-json.h from
> > dumpfile.cc doesn't seem to be necessary.  Since you declare
> > dump_node_json in dumpfile.h it should be possible to elide the header
> > and put the contents into the tree-emit-json.cc file.
> >
> > Another #include is duplicated (and also looks unnecessary).
> >
> >
> > I know you have some crude JSON -> html translation script written in
> > python - can you share that as well?  I'd suggest to post separate from
> > this main patch, adding it to contrib/.
> >
> > Can we solve the multi-function issue somehow?  I know we have some
> > abstraction for a dump file, we'd need a hook that's invoked on opening
> > and closing to emit a something there - I guess it would be even OK to
> > hard-code this into dumpfile.cc for the -JSON dump variant.  It might
> > be possible to register dump specific data with that object and get
> > to the "current" dump file in dump_node_json so the splay-tree could
> > be kept live and the allocations released on dump-file close?  Again,
> > two hard-coded hooks from dumpfile.cc at open/close time into
> > the JSON dumping for this might be feasible and track the global state
> > with global variables.  That's to allow references to global objects and
> > types streamed in a previous function context.
> >
> > Thanks,
> > Richard.
> >
> >
> > > Some high level thoughts:
> > >
> > > * the patch uses "dummy" throughout as a variable name.  To me the name
> > > "dummy" suggests something unimportant that we had to give a name to,
> > > or something that we'd prefer didn't exist but had to create.  However
> > > in most(all?) cases "dummy" seems to refer to the json object being
> > > created or having properties added to it, and thus the most interesting
> > > thing in the function.  I suspect that renaming "dummy" to "js_obj" or
> > > "json_obj" throughout would be an improvement in readability in terms
> > > of capturing the intent of the code (assuming that all of them are
> > > indeed json objects).
> > >
> > > * I think the code is leaking memory for all of the json values created
> > > - there are lots of uses of "naked new" in this code, but I don't see
> > > any uses of "delete".  For example, in
> > >
> > > > +void
> > > > +dump_node_json (const_tree t, dump_flags_t flags, FILE *stream)
> > > > +{
> > > > +  struct dump_info di;
> > > > +  dump_queue_p dq;
> > > > +  dump_queue_p next_dq;
> > > > +  pretty_printer pp;
> > > > +  /* Initialize the dump-information structure.  */
> > > > +  di.stream = stream;
> > > > +  di.index = 0;
> > > > +  di.column = 0;
> > > > +  di.queue = 0;
> > > > +  di.queue_end = 0;
> > > > +  di.free_list = 0;
> > > > +  di.flags = flags;
> > > > +  di.node = t;
> > > > +  di.nodes = splay_tree_new (splay_tree_compare_pointers, 0,
> > > > +                          splay_tree_delete_pointers);
> > > > +  di.json_dump = new json::array ();
> > >      ^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^^
> > >      allocated with naked new here
> > >
> > > > +  /* Queue up the first node.  */
> > > > +  queue (&di, t);
> > > > +
> > > > +  /* Until the queue is empty, keep dumping nodes.  */
> > > > +  while (di.queue)
> > > > +    dequeue_and_dump (&di);
> > > > +
> > > > +  di.json_dump->dump(stream, true);
> > > > +  fputs("\n", stream);
> > > > +  /* Now, clean up.  */
> > > > +  for (dq = di.free_list; dq; dq = next_dq)
> > > > +    {
> > > > +      next_dq = dq->next;
> > > > +      free (dq);
> > > > +    }
> > > > +  splay_tree_delete (di.nodes);
> > >
> > > and di.json_dump goes out of scope here and is leaked, I think.  So I
> > > *think* all of the json values being created during dumping are being
> > > leaked.
> > >
> > > > +}
> > >
> > > Similarly, in:
> > >
> > > > +DEBUG_FUNCTION void
> > > > +debug_tree_json (tree t)
> > > > +{
> > > > +  json::object* _x = node_emit_json(t);
> > > > +  _x->dump(stderr, true);
> > > > +  fprintf(stderr, "\n");
> > > > +}
> > >
> > > if I'm reading things right, node_emit_json doesn't "emit" json so much
> > > as create a new json::object on the heap via "new", and when "_x" goes
> > > out of scope, it's leaked.
> > >
> > > The pattern in the code seems to be that node_emit_json creates a new
> > > json::object and populates it with properties (sometimes recursively).
> > >
> > > Given that, and that we can use C++11, I recommend using
> > > std::unique_ptr<json::object> for it, to capture the intent that this
> > > is a heap-allocated pointer with responsibility for being "delete"-d at
> > > some point.
> > >
> > > That way, rather that:
> > >
> > >   json::object*
> > >   node_emit_json(tree t)
> > >   {
> > >     tree op0, op1, type;
> > >     enum tree_code code;
> > >     expanded_location xloc;
> > >     json::object *dummy;
> > >     json::array* holder;
> > >     char address_buffer[sizeof(&t)] = {"\0"};
> > >
> > >     dummy = new json::object ();
> > >     holder = new json::array ();
> > >
> > >     [...snip...]
> > >
> > >     return dummy;
> > >   }
> > >
> > >
> > > we could have (perhaps renaming to "node_to_json"):
> > >
> > >   std::unique_ptr<json::object>
> > >   node_to_json(tree t)
> > >   {
> > >     tree op0, op1, type;
> > >     enum tree_code code;
> > >     expanded_location xloc;
> > >     char address_buffer[sizeof(&t)] = {"\0"};
> > >
> > >     auto js_obj = ::make_unique<json::object> (); // to implicitly use 
> > > std::unique_ptr<json::object>
> > >     auto holder = ::make_unique<json::array> ();  // likewise 
> > > std::unique_ptr<json::array>
> > >
> > >     [...snip...]
> > >
> > >     return js_obj;
> > >   }
> > >
> > > ...assuming that I'm correctly understanding the ownership of the json
> > > values in the patch.  Note that we have to use ::make_unique from our
> > > make-unique.h, rather than std::make_unique from <memory> since the
> > > latter was only added in C++14.
> > >
> > > Many of our data structures don't properly handle objects with
> > > destructors, and I suspect splay_tree is one of these.  You can use
> > > js_obj.release () to transfer ownership to such data structures, and
> > > will (probably) need to manually use "delete" on the pointers in the
> > > right places.
> > >
> > > What happens to "holder" in that function?  It seems to get populated
> > > with json objects for the various tree nodes found recursively, but
> > > then it seems to simply be leaked (or populated then auto-deleted, if
> > > we use std::unique_ptr>.  Or am I missing something?
> > >
> > > In case it's helpful, a couple of months ago I converted the SARIF
> > > output code from using "naked" json pointers to using std::unique_ptr
> > > in:
> > >   https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658204.html
> > > and I found it helped a *lot* with documenting ownership and avoiding
> > > leaks.  You might find other things of interest in the first half of
> > > this patch kit:
> > >   https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658194.html
> > > FWIW in that code I'm also using a class hierarchy of json::object
> > > subclasses to help with compliance with a JSON schema (sarif_object is
> > > a subclass of json::object), but given that there probably isn't a
> > > schema for the JSON in TDF_JSON dumps, that's probably not relevant to
> > > this case.
> > >
> > > FWIW you can test for leaks by running the compiler under valgrind by
> > > configuring with --enable-valgrind-annotations and appending "-wrapper
> > > valgrind" to the command line (or "-wrapper valgrind,--leak-check=full"
> > > IIRC).
> > >
> > > Thanks again for the patch; hope this is constructive.
> > > Dave
> > >

Reply via email to