Gimple loop splitting v2

Michael Matz Tue, 01 Dec 2015 08:48:20 -0800

Hi,

On Mon, 16 Nov 2015, Jeff Law wrote:


> OK, if you want to keep them, then have a consistent way to turn them 
> on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
> someone else what to turn on/off if they need to debug this stuff.

> > > I don't see any negative tests -- ie tests that should not be split 
> > > due to boundary conditions.  Do you have any from development?
> > 
> > Good point, I had some but only ones where I was able to extend the 
> > splitters to cover them.  I'll think of some that really shouldn't be 
> > split.
> If you've got them, certainly add them.  Though I realize they may get 
> lost over time.

Actually, thinking a bit more about this, I don't have any that wouldn't 
be merely restrictions in the implementation that couldn't be lifted in 
the future (e.g. unequal step sizes), so I've added no additional ones.

> But in that case, the immediate dominator of pre2 & join is still the 
> initial if statement.  So I think we're OK.  That was the conclusion I 
> was starting to come to yesterday, having the ascii art makes it pretty 
> clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
> explicitly and then everything seems so clear and simple.

So, this second version should reflect the review.  I've moved everything 
to a new file, split the long function into several logically separate 
ones, and even included ascii art in the comments :)  The testcase got a 
comment about what to #define for debugging.  I've included the pass to 
-O3 or alternatively if profile-use is on, similar to funswitch-loops.  
I've also added a proper -fsplit-loops option.

There's two functional changes in v2: a bugfix to not try splitting a 
non-iterating loop (irritatingly such a look returns true from 
number_of_iterations_exit, but with an ERROR_MARK comparator), and a 
limitation to avoid combinatorical explosion in artificial testcases: Once 
we have done a splitting, we don't do any in that loops parents (we may 
still do splitting in siblings or childs of siblings).

I've also done some measurements: first, bootstrap time is unaffected, and 
regstrapping succeeds without regressions when I activate the pass by 
default.  Then SPECcpu2006: build times are unaffected, everything builds 
and works also with -fsplit-loops, performance is mostly unaffected, base 
is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops.

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  
---------
400.perlbench    9770        325       30.1 *    9770        323       30.3 *  
401.bzip2        9650        382       25.2 *    9650        382       25.3 *  
403.gcc          8050        242       33.3 *    8050        241       33.4 *  
429.mcf          9120        311       29.3 *    9120        311       29.3 *  
445.gobmk       10490        392       26.8 *   10490        391       26.8 *  
456.hmmer        9330        345       27.0 *    9330        342       27.3 *  
458.sjeng       12100        422       28.7 *   12100        420       28.8 *  
462.libquantum  20720        308       67.3 *   20720        308       67.3 *  
464.h264ref     22130        423       52.3 *   22130        423       52.3 *  
471.omnetpp      6250        273       22.9 *    6250        273       22.9 *  
473.astar        7020        311       22.6 *    7020        311       22.6 *  
483.xalancbmk    6900        191       36.2 *    6900        190       36.2 *  
 Est. SPECint_base2006                 31.7
 Est. SPECint2006                                                      31.7

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  
---------
410.bwaves      13590        235       57.7 *   13590        235       57.8 *  
416.gamess                                  NR                              NR 
433.milc         9180        347       26.5 *    9180        345       26.6 *  
434.zeusmp       9100        269       33.9 *    9100        268       33.9 *  
435.gromacs      7140        260       27.4 *    7140        262       27.3 *  
436.cactusADM   11950        237       50.5 *   11950        240       49.9 *  
437.leslie3d     9400        228       41.3 *    9400        228       41.2 *  
444.namd         8020        312       25.7 *    8020        311       25.7 *  
447.dealII      11440        254       45.0 *   11440        254       45.0 *  
450.soplex       8340        201       41.4 *    8340        202       41.4 *  
453.povray                                  NR                              NR 
454.calculix     8250        282       29.2 *    8250        283       29.2 *  
459.GemsFDTD    10610        310       34.3 *   10610        309       34.3 *  
465.tonto        9840        683       14.4 *    9840        684       14.4 *  
470.lbm         13740        224       61.2 *   13740        224       61.3 *  
481.wrf         11170        291       38.4 *   11170        291       38.4 *  
482.sphinx3     19490        377       51.7 *   19490        377       51.6 *  
 Est. SPECfp_base2006                  36.3
 Est. SPECfp2006                                                       36.3

The 1% improvements and degradations are all inside the normal result 
variations on this machine (I have the feeling that the hmmer improvement 
is stable, and will recheck this).  Not all of the above had loops split 
at all, only: SPECint: 400.perlbench, 403.gcc, 445.gobmk, 456.hmmer, 
462.libquantum, 464.h264ref, 471.omnetpp and SPECfp: 435.gromacs, 
436.cactusADM, 447.dealII, 454.calculix.

So, okay for trunk?


Ciao,
Michael.

Gimple loop splitting v2

Reply via email to