On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote: > On Tue, May 29, 2018 at 10:33 PM David Malcolm <dmalc...@redhat.com> > wrote: > > > > This was an experiment to try to capture information on a > > loop optimization. > > > > gcc/ChangeLog: > > * gimple-loop-interchange.cc (should_interchange_loops): > > Add > > optinfo note when interchange gives better data locality > > behavior. > > (tree_loop_interchange::interchange): Add OPTINFO_SCOPE. > > Add optinfo for successful and unsuccessful interchanges. > > (prepare_perfect_loop_nest): Add OPTINFO_SCOPE. Add > > optinfo note. > > (pass_linterchange::execute): Add OPTINFO_SCOPE. > > --- > > gcc/gimple-loop-interchange.cc | 36 > > +++++++++++++++++++++++++++++++++++- > > 1 file changed, 35 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop- > > interchange.cc > > index eb35263..cd32288 100644 > > --- a/gcc/gimple-loop-interchange.cc > > +++ b/gcc/gimple-loop-interchange.cc > > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx, > > unsigned o_idx, > > ratio = innermost_loops_p ? INNER_STRIDE_RATIO : > > OUTER_STRIDE_RATIO; > > /* Do interchange if it gives better data locality behavior. */ > > if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio))) > > - return true; > > + { > > + if (optinfo_enabled_p ()) > > + OPTINFO_NOTE ((gimple *)NULL) // FIXME > > + << "interchange gives better data locality behavior: " > > + << "iloop_strides: " > > + << decu (iloop_strides) > > + << " > (oloop_strides: " > > + << decu (oloop_strides) > > + << " * ratio: " > > + << decu (ratio) > > + << ")"; > > Just randomly inside the thread. > > NOOOOOOOOOO! > > :/
> Please do _not_ add more stream-like APIs. How do you expect > translators to deal with those? > > Yes, I'm aware of the graphite-* ones and I dislike those very much. > > What's wrong with the existing dump API? The existing API suffers from a "wall of text" problem: * although it's possible to filter based on various criteria (the optgroup tags, specific passes, success vs failure), it's not possible to filter base on code hotness: the -fopt-info API works purely in terms of location_t. So all of the messages about the hottest functions in the workload are intermingled with all of the other messages about all of the other functions. * some of the text notes refer to function entry, but all of these are emitted "at the same level": there's no way to see the nesting of these function-entry logs, and where other notes are in relation to them. For example, in: test.c:8:3: note: === analyzing loop === test.c:8:3: note: === analyze_loop_nest === test.c:8:3: note: === vect_analyze_loop_form === test.c:8:3: note: === get_loop_niters === test.c:8:3: note: symbolic number of iterations is (unsigned int) n_9(D) test.c:8:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed test.c:8:3: note: vectorized 0 loops in function there's no way to tell that the "vect_analyze_loop_form" is in fact inside the call to "analyze_loop_nest", and where the "symbolic number of iterations" messages is coming from in relation to them. This may not seem significant here, but I'm quoting a small example; vectorization typically leads to dozens of messages, with a deep nesting structure (where that structure isn't visible in the -fopt-info output). The existing API is throwing data away: * as noted above, by working purely with a location_t, the execution count isn't associated with the messages. The output format purely gives file/line/column information, but doesn't cover the inlining chain. For C++ templates it doesn't specify which instance of a template is being optimized. * there's no metadata about where the messages are coming from. It's easy to get at the current pass internally, but the messages don't capture that. Figuring out where a message came from requires grepping the GCC source code. The prototype I posted captures the __FILE__ and __LINE__ within the gcc source for every message emitted, and which pass instance emitted it. * The current output format is of the form: "FILE:LINE:COLUMN: free-form text\n" This is only machine-readable up to a point: if a program is parsing it, all it has is the free-form text. The prototype I posted captures what kinds of things are in the text (statements, trees, symtab_nodes), and captures location information for them, so that visualizations of the dumps can provide useful links. There's no API-level grouping of messages, beyond looking for newline characters. I'm probably re-hashing a lot of the material in the cover letter here: "[PATCH 00/10] RFC: Prototype of compiler-assisted performance analysis" https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01675.html I'd like to provide a machine-readable output format that covers the above - in particular the profile data (whilst retaining -fopt-info for compatibility). Separation of the data from its presentation. Clearly you don't like the stream-like API from the prototype :) So I'm wondering what the API ought to look like, one that would allow for the kind of "richer" machine-readable output. Consider this "-fopt-info" code (from "vect_create_data_ref_ptr"; this is example 2 from the cover-letter): if (dump_enabled_p ()) { tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr)); dump_printf_loc (MSG_NOTE, vect_location, "create %s-pointer variable to type: ", get_tree_code_name (TREE_CODE (aggr_type))); dump_generic_expr (MSG_NOTE, TDF_SLIM, aggr_type); if (TREE_CODE (dr_base_type) == ARRAY_TYPE) dump_printf (MSG_NOTE, " vectorizing an array ref: "); else if (TREE_CODE (dr_base_type) == VECTOR_TYPE) dump_printf (MSG_NOTE, " vectorizing a vector ref: "); else if (TREE_CODE (dr_base_type) == RECORD_TYPE) dump_printf (MSG_NOTE, " vectorizing a record based array ref: "); else dump_printf (MSG_NOTE, " vectorizing a pointer ref: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_BASE_OBJECT (dr)); dump_printf (MSG_NOTE, "\n"); } where the information is built up piecewise, with conditional logic. (This existing code isn't particularly amenable to translation either). One option would be for "vect_location" to become something other than a location_t, from which a execution count can be gained (e.g. a gimple *, from which the location_t and the BB can be accessed). Then "dump_printf_loc" would signify starting an optimization record, and the final "\n" would signify the end of an optimization record; the various dump_generic_expr and dump_printf would add structured information and text entries to theoptimization record. This has the advantage of retaining the look of the existing API (so the existing callsites don't need changing), but it changes their meaning, so that as well as writing to -fopt-info's destinations, it's also in a sense parsing the dump_* calls and building optimization records from them. AIUI, dump_printf_loc can't be a macro, as it's variadic, so this approach doesn't allow for capturing the location within GCC for where the message is emitted. Another approach: there could be something like: if (optinfo_enabled_p ()) { tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr)); OPTINFO_VECT_NOTE note; note.printf ("create %s-pointer variable to type: ", get_tree_code_name (TREE_CODE (aggr_type))); note.add_slim (aggr_type); if (TREE_CODE (dr_base_type) == ARRAY_TYPE) note.printf (" vectorizing an array ref: "); else if (TREE_CODE (dr_base_type) == VECTOR_TYPE) note.printf (MSG_NOTE, " vectorizing a vector ref: "); else if (TREE_CODE (dr_base_type) == RECORD_TYPE) note.printf (MSG_NOTE, " vectorizing a record based array ref: "); else note.printf (MSG_NOTE, " vectorizing a pointer ref: "); note.add_slim (DR_BASE_OBJECT (dr)); } which uses a macro to get the gcc source location, and avoids the special meaning for "\n". This avoids the "<<" - but is kind of just different syntax for the same - it's hard with this example to avoid it. Yet another approach: reworking things to support i18n via a pure printf-style interface, using, say "%S" to mean "TDF_SLIM", it could be like: if (optinfo_enabled_p ()) { tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr)); if (TREE_CODE (dr_base_type) == ARRAY_TYPE) OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" " vectorizing an array ref: %S", get_tree_code_name (TREE_CODE (aggr_type)) aggr_type, DR_BASE_OBJECT (dr)); else if (TREE_CODE (dr_base_type) == VECTOR_TYPE) OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" " vectorizing a vector ref: %S", get_tree_code_name (TREE_CODE (aggr_type)) aggr_type, DR_BASE_OBJECT (dr)); else if (TREE_CODE (dr_base_type) == RECORD_TYPE) OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" " vectorizing a record based array ref: %S", get_tree_code_name (TREE_CODE (aggr_type)) aggr_type, DR_BASE_OBJECT (dr)); else OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" " vectorizing a pointer ref: %S", get_tree_code_name (TREE_CODE (aggr_type)) aggr_type, DR_BASE_OBJECT (dr)); } or somesuch. The %S would allow for the data to be captured when emitting optimization records for the messages (not just plain text, e.g. locations of the expressions). So those are some ideas. I'm sure there are other ways to fix the issues with -fopt-info; I'm brainstorming here. [...snip...] Thoughts? Thanks Dave