On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguent...@suse.de> wrote:
> > This simplifies canonicalize_loop_closed_ssa and does other minimal > TLC. It also adds a testcase I reduced from a stupid mistake I made > when reworking canonicalize_loop_closed_ssa. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with > -Ofast -march=haswell -floop-nest-optimize are > > 61 loop nests "optimized" > 45 loop nest transforms cancelled because of code generation issues > 21 loop nest optimizations timed out the 350000 ISL "operations" we allow > > I say "optimized" because the usual transform I've seen is static tiling > as enforced by GRAPHITE according to --param loop-block-tile-size. > There's no way to automagically figure what kind of transform ISL did > Here is how to automate (without magic) the detection of the transform that isl did. The problem solved by isl is the minimization of strides in memory, and to do this, we need to tell the isl scheduler the validity dependence graph, in graphite-optimize-isl.c see the validity (RAW, WAR, WAW) and the proximity (RAR + validity) maps. The proximity does include the read after read, as the isl scheduler needs to minimize strides between consecutive reads. When you apply the schedule to the dependence graph, one can tell from the result the strides in memory, a good way to say whether a transform was beneficial is to sum up all memory strides, and make sure that the sum of all strides decreases after transform. We could add a printf with the sum of strides before and after transforms, and have the testcases check for that. (usually none with the schedule identical check confused by FILTER > stuff positioning). This is also the issue with most GRAPHITE testcases. > We can't really verify easily whether we performed loop interchange > or not. We can probably tell whether we applied loop fusion or > splitting (by counting loops). > > I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE. > > I'm aware that the current "black-box" granularity hinders > scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this > is too coarse to schedule say two writes in a BB independently > from each other). Quick experiments could be done by simply > splitting gimple BBs at some points. > > I'm aware that the SCOP detection algorithm assumes that it can > walk loop->next and find loops "in order" -- but while that's > true for the initial flow_loops_find result (DFS walk) it isn't > true for any later created / discovered loops. Sorting of > loop siblings in DFS order should be easy (and a general cfgloopanal > helper). > > Richard. > > 2017-09-22 Richard Biener <rguent...@suse.de> > > * graphite-isl-ast-to-gimple.c (graphite_verify): Inline into > single caller. > (graphite_regenerate_ast_isl): Do not reset SCEV. Move debug > print of no dependency loops ... > * graphite.c (graphite_transform_loops): ... here. > (canonicalize_loop_closed_ssa_form): Work from inner to outer > loops. > (same_close_phi_node, remove_duplicate_close_phi, > make_close_phi_nodes_unique, defined_in_loop_p): Fold into ... > (canonicalize_loop_closed_ssa): ... here and simplify. > * graphite-optimize-isl.c: Include tree-vectorizer.h. > (optimize_isl): Use dump_printf_loc to tell when we stopped > optimizing because of an ISL timeout. > > * gcc.dg/graphite/scop-24.c: New testcase. > > The change looks good to me. Thanks, Sebastian