Re: [PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc

David Malcolm Fri, 01 Jun 2018 06:41:14 -0700

On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote:
> On Tue, May 29, 2018 at 10:33 PM David Malcolm <dmalc...@redhat.com>
> wrote:
> > 
> > This was an experiment to try to capture information on a
> > loop optimization.
> > 
> > gcc/ChangeLog:
> >         * gimple-loop-interchange.cc (should_interchange_loops):
> > Add
> >         optinfo note when interchange gives better data locality
> > behavior.
> >         (tree_loop_interchange::interchange): Add OPTINFO_SCOPE.
> >         Add optinfo for successful and unsuccessful interchanges.
> >         (prepare_perfect_loop_nest): Add OPTINFO_SCOPE.  Add
> >         optinfo note.
> >         (pass_linterchange::execute): Add OPTINFO_SCOPE.
> > ---
> >  gcc/gimple-loop-interchange.cc | 36
> > +++++++++++++++++++++++++++++++++++-
> >  1 file changed, 35 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-
> > interchange.cc
> > index eb35263..cd32288 100644
> > --- a/gcc/gimple-loop-interchange.cc
> > +++ b/gcc/gimple-loop-interchange.cc
> > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx,
> > unsigned o_idx,
> >    ratio = innermost_loops_p ? INNER_STRIDE_RATIO :
> > OUTER_STRIDE_RATIO;
> >    /* Do interchange if it gives better data locality behavior.  */
> >    if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio)))
> > -    return true;
> > +    {
> > +      if (optinfo_enabled_p ())
> > +       OPTINFO_NOTE ((gimple *)NULL) // FIXME
> > +         << "interchange gives better data locality behavior: "
> > +         << "iloop_strides: "
> > +         << decu (iloop_strides)
> > +         << " > (oloop_strides: "
> > +         << decu (oloop_strides)
> > +         << " * ratio: "
> > +         << decu (ratio)
> > +         << ")";
> 
> Just randomly inside the thread.
> 
> NOOOOOOOOOO!
> 
> :/


> Please do _not_ add more stream-like APIs.  How do you expect
> translators to deal with those?
> 
> Yes, I'm aware of the graphite-* ones and I dislike those very much.
> 
> What's wrong with the existing dump API?

The existing API suffers from a "wall of text" problem:

* although it's possible to filter based on various criteria (the
optgroup tags, specific passes, success vs failure), it's not possible
to filter base on code hotness: the -fopt-info API works purely in
terms of location_t.  So all of the messages about the hottest
functions in the workload are intermingled with all of the other
messages about all of the other functions.

* some of the text notes refer to function entry, but all of these are
emitted "at the same level": there's no way to see the nesting of these
function-entry logs, and where other notes are in relation to them. 
For example, in:

  test.c:8:3: note: === analyzing loop ===
  test.c:8:3: note: === analyze_loop_nest ===
  test.c:8:3: note: === vect_analyze_loop_form ===
  test.c:8:3: note: === get_loop_niters ===
  test.c:8:3: note: symbolic number of iterations is (unsigned int) n_9(D)
  test.c:8:3: note: not vectorized: loop contains function calls or data 
references that cannot be analyzed
  test.c:8:3: note: vectorized 0 loops in function

there's no way to tell that the "vect_analyze_loop_form" is in fact
inside the call to "analyze_loop_nest", and where the "symbolic number
of iterations" messages is coming from in relation to them.  This may
not seem significant here, but I'm quoting a small example;
vectorization typically leads to dozens of messages, with a deep
nesting structure (where that structure isn't visible in the -fopt-info 
output).

The existing API is throwing data away:

* as noted above, by working purely with a location_t, the execution
count isn't associated with the messages.  The output format purely
gives file/line/column information, but doesn't cover the inlining
chain.   For C++ templates it doesn't specify which instance of a
template is being optimized.

* there's no metadata about where the messages are coming from.  It's
easy to get at the current pass internally, but the messages don't
capture that.  Figuring out where a message came from requires grepping
the GCC source code.  The prototype I posted captures the __FILE__ and
__LINE__ within the gcc source for every message emitted, and which
pass instance emitted it.

* The current output format is of the form:
     "FILE:LINE:COLUMN: free-form text\n"
This is only machine-readable up to a point: if a program is parsing
it, all it has is the free-form text.  The prototype I posted captures
what kinds of things are in the text (statements, trees, symtab_nodes),
and captures location information for them, so that visualizations of
the dumps can provide useful links.

There's no API-level grouping of messages, beyond looking for newline
characters.

I'm probably re-hashing a lot of the material in the cover letter here:
"[PATCH 00/10] RFC: Prototype of compiler-assisted performance analysis"
  https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01675.html


I'd like to provide a machine-readable output format that covers the
above - in particular the profile data (whilst retaining -fopt-info for
compatibility).  Separation of the data from its presentation.

Clearly you don't like the stream-like API from the prototype :)

So I'm wondering what the API ought to look like, one that would allow
for the kind of "richer" machine-readable output.

Consider this "-fopt-info" code (from "vect_create_data_ref_ptr"; this
is example 2 from the cover-letter):

  if (dump_enabled_p ())
     {
       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
       dump_printf_loc (MSG_NOTE, vect_location,
                        "create %s-pointer variable to type: ",
                        get_tree_code_name (TREE_CODE (aggr_type)));
       dump_generic_expr (MSG_NOTE, TDF_SLIM, aggr_type);
       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
         dump_printf (MSG_NOTE, "  vectorizing an array ref: ");
       else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
         dump_printf (MSG_NOTE, "  vectorizing a vector ref: ");
       else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
         dump_printf (MSG_NOTE, "  vectorizing a record based array ref: ");
       else
         dump_printf (MSG_NOTE, "  vectorizing a pointer ref: ");
       dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_BASE_OBJECT (dr));
       dump_printf (MSG_NOTE, "\n");
     }

where the information is built up piecewise, with conditional logic.
(This existing code isn't particularly amenable to translation either).

One option would be for "vect_location" to become something other than
a location_t, from which a execution count can be gained (e.g. a gimple
*, from which the location_t and the BB can be accessed).  Then
"dump_printf_loc" would signify starting an optimization record, and
the final "\n" would signify the end of an optimization record; the
various dump_generic_expr and dump_printf would add structured
information and text entries to theoptimization record.

This has the advantage of retaining the look of the existing API (so
the existing callsites don't need changing), but it changes their
meaning, so that as well as writing to -fopt-info's destinations, it's
also in a sense parsing the dump_* calls and building optimization
records from them.

AIUI, dump_printf_loc can't be a macro, as it's variadic, so this
approach doesn't allow for capturing the location within GCC for where
the message is emitted.

Another approach: there could be something like:

  if (optinfo_enabled_p ())
    {
       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
       OPTINFO_VECT_NOTE note;
       note.printf ("create %s-pointer variable to type: ",
                    get_tree_code_name (TREE_CODE (aggr_type)));
       note.add_slim (aggr_type);
       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
         note.printf ("  vectorizing an array ref: ");
       else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
         note.printf (MSG_NOTE, "  vectorizing a vector ref: ");
       else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
         note.printf (MSG_NOTE, "  vectorizing a record based array ref: ");
       else
         note.printf (MSG_NOTE, "  vectorizing a pointer ref: ");
       note.add_slim (DR_BASE_OBJECT (dr));
    }

which uses a macro to get the gcc source location, and avoids the
special meaning for "\n".  This avoids the "<<" - but is kind of just
different syntax for the same - it's hard with this example to avoid
it.

Yet another approach: reworking things to support i18n via a pure
printf-style interface, using, say "%S" to mean "TDF_SLIM", it could be
like:

  if (optinfo_enabled_p ())
    {
       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
                            "  vectorizing an array ref: %S",
                            get_tree_code_name (TREE_CODE (aggr_type))
                            aggr_type,
                            DR_BASE_OBJECT (dr));
      else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
                            "  vectorizing a vector ref: %S",
                            get_tree_code_name (TREE_CODE (aggr_type))
                            aggr_type,
                            DR_BASE_OBJECT (dr));
      else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
                            "  vectorizing a record based array ref: %S",
                            get_tree_code_name (TREE_CODE (aggr_type))
                            aggr_type,
                            DR_BASE_OBJECT (dr));
      else
         OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S"
                            "  vectorizing a pointer ref: %S",
                            get_tree_code_name (TREE_CODE (aggr_type))
                            aggr_type,
                            DR_BASE_OBJECT (dr));
    }

or somesuch.  The %S would allow for the data to be captured when
emitting optimization records for the messages (not just plain text,
e.g. locations of the expressions).

So those are some ideas.  I'm sure there are other ways to fix the
issues with -fopt-info; I'm brainstorming here.

[...snip...]

Thoughts?

Thanks
Dave

Re: [PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc

Reply via email to