On Fri, Sep 13, 2024 at 8:32 AM Thor Preimesberger <tcpreimesber...@gmail.com> wrote: > > > There are three oddities I immediately notice: > > > > The PLUS_EXPR operands are in a array "operands" while the RETURN_EXPR > > "operand" or "child pointer" is refered to from "return_expr". I think > > both are > > tcc_expression trees and the operands are in exp.operands. Ideally the > > JSON would more closely reflect that rather than possibly following the > > "pretty" > > printing logic. > > Ah - for the binary operator, operands may have been a poor choice of > words there. > There's an abstract nonsense definition that would not necessarily be > reasonable here. > Would it make more sense to dump it e.g. as > ..... > "bin_operator": "+", > "op0": {"addr": "0x7f8256bda360" > ...} > "op1": {"addr": ...} > .....
I actually liked the array, so following using accessors it then would be "operand" : [ {...}, {...} ] omitting tree_ (give we're dumping trees), if I got the idea right that [] denotes an array of sub-objects in JSON. That you have both > "tree code": "plus_expr", > "bin_operator": "+", is a bit redundant, the 'bin_operator' would be something that's the same for all nodes with 'tree code' : 'plus_expr'. I would simply omit that attribute. > I think there are some parts of the code that I wrote that don't have > the accessor used as their key when referring to a different node - > e.g. case PLACEHOLDER_EXPR. Would this be an issue? Well, it's just names - but yes, using the accessor is probably most useful to developers as they are likely familiar with that. > > While the tree_code of a tree node is the most important bit (maybe besides > > of > > its address), the "tree code" attribute is after the locations (which > > are also quite > > verbose and distracting - at least when looking at raw JSON). For locations > > one could honor TDF_LINENO and only dump those when using > > -fdump-tree-original-json-lineno. I'd re-order "tree code" after "addr". > > Sounds good - I'll implement dumping locs iff TDF_LINENO is enabled. > > > The third issue is that above the tree node with address 0x7f8256a10c60 > > (and its children) appear twice - while you maintain a splay tree and assign > > unique numbers the duplicate nodes are not output by reference? I would > > suggest to use { "ref_addr" : "0x7f8256a10c60" } for the output of such > > reference for example. > > > > I'm not sure whether JSON allows different object kinds or if that's solely > > done by having a special attribute if that's needed. With the above > > regular tree nodes would be "addr" and references be "ref_addr". A > > recursive > > JSON structure like above is OK to look at in RAW, I'm not sure whether > > for automatic processing and for example translating to a graph a linear > > collection of nodes with references would be easier to handle. > > I agree that it should be easier to process the JSON if the references have a > different key. Should be easy to implement. Yep, the important part is to avoid the duplicates in the JSON. > > Few comments on the patch itself - the #include of tree-emit-json.h from > > dumpfile.cc doesn't seem to be necessary. Since you declare > > dump_node_json in dumpfile.h it should be possible to elide the header > > and put the contents into the tree-emit-json.cc file. > > > > Another #include is duplicated (and also looks unnecessary). > > > > All fixed now on my working tree. > > > I know you have some crude JSON -> html translation script written in > > python - can you share that as well? I'd suggest to post separate from > > this main patch, adding it to contrib/. > > Sure - let me get the fixes suggested in this email done since it'll > change (and simplify) the logic a bit. > > > Can we solve the multi-function issue somehow? I know we have some > > abstraction for a dump file, we'd need a hook that's invoked on opening > > and closing to emit a something there - I guess it would be even OK to > > hard-code this into dumpfile.cc for the -JSON dump variant. It might > > be possible to register dump specific data with that object and get > > to the "current" dump file in dump_node_json so the splay-tree could > > be kept live and the allocations released on dump-file close? Again, > > two hard-coded hooks from dumpfile.cc at open/close time into > > the JSON dumping for this might be feasible and track the global state > > with global variables. That's to allow references to global objects and > > types streamed in a previous function context. > > If the multi-function issue is that the dump pass currently produces > a series of JSON objects rather than a single one - I think what you're > suggesting is essentially done by optrecord_json_writer, for > -fsave-optimization-record. One approach I have in my head > is for, let's call it a tree_json_writer, to hold a > json::array, append each node we traverse, and then > flush this array to the dumpfile at the end. Yeah, that would work as well. Thanks, Richard. > This would also enable a way to address what you brought > up at the very end. > > (In the python script I have written up, I just call the bash command > I posted in the first email to turn the output into a single JSON object. > I don't expect that it's really possible to call sed from within gcc.) > > Best, > Thor > > On Thu, Sep 12, 2024 at 7:14 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Thu, Sep 12, 2024 at 12:51 PM David Malcolm <dmalc...@redhat.com> wrote: > > > > > > On Wed, 2024-09-11 at 20:49 -0500, tcpreimesber...@gmail.com wrote: > > > > From: Thor C Preimesberger <tcpreimesber...@gmail.com> > > > > > > > > This patch allows the compiler to dump GENERIC trees as JSON objects. > > > > > > > > The dump flag -fdump-tree-original-json dumps each fndecl node in the > > > > C frontend's gimplifier as a JSON object and traverses related nodes > > > > in an analagous manner as to raw-dumping. > > > > > > Thanks for posting this patch. > > > > > > Are you able to upload somewhere some examples of what the dumps look > > > like? > > > > I found https://renhongl.github.io/json-editor/ which seems to accept > > the output of -fdump-tree-original-json visualizes the raw JSON structure > > when the input is from a single function. > > > > struct S { int i; int j; } s; > > int bar () > > { > > return s.i + s.j; > > } > > int main() > > { > > return bar (); > > } > > > > no longer recognizes it, I would guess we'd need to produce an outer > > "file level" JSON node. Simply wrapping the file in [{ ... }] didn't > > work even with comma separating two functions. > > > > The JSON for bar looks like (sorry for the long paste) > > > > [{"addr": "0x7f8256bda3c0", > > "expr_loc": [{"file": "t.c", > > "line": 3, > > "column": 1}], > > "start_loc": [{"file": "t.c", > > "line": 3, > > "column": 1}], > > "finish_loc": [{"file": "t.c", > > "line": 3, > > "column": 1}], > > "tree code": "bind_expr", > > "bind_expr_body": {"addr": "0x7f8256bab860", > > "expr_loc": [{"file": "t.c", > > "line": 4, > > "column": 14}], > > "start_loc": [{"file": "t.c", > > "line": 4, > > "column": 10}], > > "finish_loc": [{"file": "t.c", > > "line": 4, > > "column": 18}], > > "tree code": "return_expr", > > "return_expr": {"addr": "0x7f8256bb1b40", > > "expr_loc": [{"file": "t.c", > > "line": 4, > > "column": 14}], > > "start_loc": [{"file": "t.c", > > "line": 4, > > "column": 10}], > > "finish_loc": [{"file": "t.c", > > "line": 4, > > "column": 18}], > > "tree code": "plus_expr", > > "bin_operator": "+", > > "operands": [{"addr": "0x7f8256bda360", > > "expr_loc": [{"file": > > "t.c", > > "line": 4, > > "column": > > 11}], > > "start_loc": [{"file": > > "t.c", > > "line": 4, > > > > "column": 10}], > > "finish_loc": > > [{"file": "t.c", > > "line": > > 4, > > > > "column": 12}], > > "tree code": > > "component_ref", > > "expr": {"addr": > > "0x7f8256a10c60", > > > > "decl_loc": [{"file": "t.c", > > > > "line": 1, > > > > "column": 28}], > > "tree > > code": "var_decl", > > "used": true, > > "public": true, > > " static": true, > > "read": true, > > "mode": "DI", > > > > "defer-output": true, > > "id_to_locale": > > "s", > > "id_point": > > "s"}, > > "field": {"addr": > > "0x7f8256a30688", > > > > "decl_loc": [{"file": "t.c", > > > > "line": 1, > > > > "column": 16}], > > "tree > > code": "field_decl", > > "mode": "SI", > > > > "id_to_locale": "i", > > "id_point": > > "i"}}, > > {"addr": "0x7f8256bda390", > > "expr_loc": [{"file": > > "t.c", > > "line": 4, > > "column": > > 17}], > > "start_loc": [{"file": > > "t.c", > > "line": 4, > > > > "column": 16}], > > "finish_loc": > > [{"file": "t.c", > > "line": > > 4, > > > > "column": 18}], > > "tree code": > > "component_ref", > > "expr": {"addr": > > "0x7f8256a10c60", > > > > "decl_loc": [{"file": "t.c", > > > > "line": 1, > > > > "column": 28}], > > "tree > > code": "var_decl", > > "used": true, > > "public": true, > > " static": true, > > "read": true, > > "mode": "DI", > > > > "defer-output": true, > > "id_to_locale": > > "s", > > "id_point": > > "s"}, > > "field": {"addr": > > "0x7f8256a30720", > > > > "decl_loc": [{"file": "t.c", > > > > "line": 1, > > > > "column": 23}], > > "tree > > code": "field_decl", > > "mode": "SI", > > > > "id_to_locale": "j", > > > > "id_point": "j"}}]}}}] > > > > There are three oddities I immediately notice: > > > > The PLUS_EXPR operands are in a array "operands" while the RETURN_EXPR > > "operand" or "child pointer" is refered to from "return_expr". I think > > both are > > tcc_expression trees and the operands are in exp.operands. Ideally the > > JSON would more closely reflect that rather than possibly following the > > "pretty" > > printing logic. > > > > While the tree_code of a tree node is the most important bit (maybe besides > > of > > its address), the "tree code" attribute is after the locations (which > > are also quite > > verbose and distracting - at least when looking at raw JSON). For locations > > one could honor TDF_LINENO and only dump those when using > > -fdump-tree-original-json-lineno. I'd re-order "tree code" after "addr". > > > > The third issue is that above the tree node with address 0x7f8256a10c60 > > (and its children) appear twice - while you maintain a splay tree and assign > > unique numbers the duplicate nodes are not output by reference? I would > > suggest to use { "ref_addr" : "0x7f8256a10c60" } for the output of such > > reference for example. > > > > I'm not sure whether JSON allows different object kinds or if that's solely > > done by having a special attribute if that's needed. With the above > > regular tree nodes would be "addr" and references be "ref_addr". A > > recursive > > JSON structure like above is OK to look at in RAW, I'm not sure whether > > for automatic processing and for example translating to a graph a linear > > collection of nodes with references would be easier to handle. > > > > > > Few comments on the patch itself - the #include of tree-emit-json.h from > > dumpfile.cc doesn't seem to be necessary. Since you declare > > dump_node_json in dumpfile.h it should be possible to elide the header > > and put the contents into the tree-emit-json.cc file. > > > > Another #include is duplicated (and also looks unnecessary). > > > > > > I know you have some crude JSON -> html translation script written in > > python - can you share that as well? I'd suggest to post separate from > > this main patch, adding it to contrib/. > > > > Can we solve the multi-function issue somehow? I know we have some > > abstraction for a dump file, we'd need a hook that's invoked on opening > > and closing to emit a something there - I guess it would be even OK to > > hard-code this into dumpfile.cc for the -JSON dump variant. It might > > be possible to register dump specific data with that object and get > > to the "current" dump file in dump_node_json so the splay-tree could > > be kept live and the allocations released on dump-file close? Again, > > two hard-coded hooks from dumpfile.cc at open/close time into > > the JSON dumping for this might be feasible and track the global state > > with global variables. That's to allow references to global objects and > > types streamed in a previous function context. > > > > Thanks, > > Richard. > > > > > > > Some high level thoughts: > > > > > > * the patch uses "dummy" throughout as a variable name. To me the name > > > "dummy" suggests something unimportant that we had to give a name to, > > > or something that we'd prefer didn't exist but had to create. However > > > in most(all?) cases "dummy" seems to refer to the json object being > > > created or having properties added to it, and thus the most interesting > > > thing in the function. I suspect that renaming "dummy" to "js_obj" or > > > "json_obj" throughout would be an improvement in readability in terms > > > of capturing the intent of the code (assuming that all of them are > > > indeed json objects). > > > > > > * I think the code is leaking memory for all of the json values created > > > - there are lots of uses of "naked new" in this code, but I don't see > > > any uses of "delete". For example, in > > > > > > > +void > > > > +dump_node_json (const_tree t, dump_flags_t flags, FILE *stream) > > > > +{ > > > > + struct dump_info di; > > > > + dump_queue_p dq; > > > > + dump_queue_p next_dq; > > > > + pretty_printer pp; > > > > + /* Initialize the dump-information structure. */ > > > > + di.stream = stream; > > > > + di.index = 0; > > > > + di.column = 0; > > > > + di.queue = 0; > > > > + di.queue_end = 0; > > > > + di.free_list = 0; > > > > + di.flags = flags; > > > > + di.node = t; > > > > + di.nodes = splay_tree_new (splay_tree_compare_pointers, 0, > > > > + splay_tree_delete_pointers); > > > > + di.json_dump = new json::array (); > > > ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^ > > > allocated with naked new here > > > > > > > + /* Queue up the first node. */ > > > > + queue (&di, t); > > > > + > > > > + /* Until the queue is empty, keep dumping nodes. */ > > > > + while (di.queue) > > > > + dequeue_and_dump (&di); > > > > + > > > > + di.json_dump->dump(stream, true); > > > > + fputs("\n", stream); > > > > + /* Now, clean up. */ > > > > + for (dq = di.free_list; dq; dq = next_dq) > > > > + { > > > > + next_dq = dq->next; > > > > + free (dq); > > > > + } > > > > + splay_tree_delete (di.nodes); > > > > > > and di.json_dump goes out of scope here and is leaked, I think. So I > > > *think* all of the json values being created during dumping are being > > > leaked. > > > > > > > +} > > > > > > Similarly, in: > > > > > > > +DEBUG_FUNCTION void > > > > +debug_tree_json (tree t) > > > > +{ > > > > + json::object* _x = node_emit_json(t); > > > > + _x->dump(stderr, true); > > > > + fprintf(stderr, "\n"); > > > > +} > > > > > > if I'm reading things right, node_emit_json doesn't "emit" json so much > > > as create a new json::object on the heap via "new", and when "_x" goes > > > out of scope, it's leaked. > > > > > > The pattern in the code seems to be that node_emit_json creates a new > > > json::object and populates it with properties (sometimes recursively). > > > > > > Given that, and that we can use C++11, I recommend using > > > std::unique_ptr<json::object> for it, to capture the intent that this > > > is a heap-allocated pointer with responsibility for being "delete"-d at > > > some point. > > > > > > That way, rather that: > > > > > > json::object* > > > node_emit_json(tree t) > > > { > > > tree op0, op1, type; > > > enum tree_code code; > > > expanded_location xloc; > > > json::object *dummy; > > > json::array* holder; > > > char address_buffer[sizeof(&t)] = {"\0"}; > > > > > > dummy = new json::object (); > > > holder = new json::array (); > > > > > > [...snip...] > > > > > > return dummy; > > > } > > > > > > > > > we could have (perhaps renaming to "node_to_json"): > > > > > > std::unique_ptr<json::object> > > > node_to_json(tree t) > > > { > > > tree op0, op1, type; > > > enum tree_code code; > > > expanded_location xloc; > > > char address_buffer[sizeof(&t)] = {"\0"}; > > > > > > auto js_obj = ::make_unique<json::object> (); // to implicitly use > > > std::unique_ptr<json::object> > > > auto holder = ::make_unique<json::array> (); // likewise > > > std::unique_ptr<json::array> > > > > > > [...snip...] > > > > > > return js_obj; > > > } > > > > > > ...assuming that I'm correctly understanding the ownership of the json > > > values in the patch. Note that we have to use ::make_unique from our > > > make-unique.h, rather than std::make_unique from <memory> since the > > > latter was only added in C++14. > > > > > > Many of our data structures don't properly handle objects with > > > destructors, and I suspect splay_tree is one of these. You can use > > > js_obj.release () to transfer ownership to such data structures, and > > > will (probably) need to manually use "delete" on the pointers in the > > > right places. > > > > > > What happens to "holder" in that function? It seems to get populated > > > with json objects for the various tree nodes found recursively, but > > > then it seems to simply be leaked (or populated then auto-deleted, if > > > we use std::unique_ptr>. Or am I missing something? > > > > > > In case it's helpful, a couple of months ago I converted the SARIF > > > output code from using "naked" json pointers to using std::unique_ptr > > > in: > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658204.html > > > and I found it helped a *lot* with documenting ownership and avoiding > > > leaks. You might find other things of interest in the first half of > > > this patch kit: > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658194.html > > > FWIW in that code I'm also using a class hierarchy of json::object > > > subclasses to help with compliance with a JSON schema (sarif_object is > > > a subclass of json::object), but given that there probably isn't a > > > schema for the JSON in TDF_JSON dumps, that's probably not relevant to > > > this case. > > > > > > FWIW you can test for leaks by running the compiler under valgrind by > > > configuring with --enable-valgrind-annotations and appending "-wrapper > > > valgrind" to the command line (or "-wrapper valgrind,--leak-check=full" > > > IIRC). > > > > > > Thanks again for the patch; hope this is constructive. > > > Dave > > >