On 07/03/2012 01:15 PM, Richard Guenther wrote:
This merges the last bit from the graphite ISL branch - an
integrated optimizer based on ISL. To quote Tobias:
"The isl scheduling optimizer implements the scheduling algorithm first
developed in Pluto [1]. Pluto has shown significant speedups and is
nowadays even implemented in the IBM XL-C compiler. The implementation of
this pass is a first draft and was copied largely from Polly. We need
still to adapt the code to the gcc coding style and we need to tune the
isl scheduler. At the moment we get reasonable compile times (at most
2x-3x slowdown) and first speedups. We now need to tune the compile time
and start to investigate which optimizations and heuristics need to be
tuned in our reimplementation.
[1] http://pluto-compiler.sourceforge.net/"
Micha kindly did the code adaption to gcc coding style and I renamed
the flag to -floop-nest-optimize (from -floop-optimize-isl). We
both agree that such integrated LNO is the way to go, superseeding
individual graphite transforms we have now. We might be even able
to drop the separate blocking& strip-mining transforms we have
right now in favor of this?
Thanks Micha for adapting the style to gcc.
I would like to point out that this pass is still very experimental and
not tuned at all. Specifically, it was only tested on polybench with one
specific set of flags. Even there we did not only get speedups, but due
to missing heuristics some benchmarks also got large slowdowns. When
using it on even slightly different benchmarks or with slightly
different flags, infinite compile time or large performance regressions
may show up! This optimizer may obviously also contain bugs that yield
to miscompiles.
Also, the loop nest optimizer will be not very effective, as long as pre
and licm are scheduled before graphite.
Having this said, I think it would be great to add this pass to gcc to
allow people to experiment with it. However, we really should mark it as
experimental. (This gives the graphite OK, in case the pass is clearly
marked as experimental)
Some smaller nits:
+floop-nest-optimize
+Common Report Var(flag_loop_optimize_isl) Optimization
+Enable the ISL based loop nest optimizer
What about adding "(experimental)" here?
fstrict-volatile-bitfields
Common Report Var(flag_strict_volatile_bitfields) Init(-1)
Force bitfield accesses to match their type width
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 87e0d1c..5263152 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -373,7 +373,7 @@ Objective-C and Objective-C++ Dialects}.
-fira-loop-pressure -fno-ira-share-save-slots @gol
-fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
-fivopts -fkeep-inline-functions -fkeep-static-consts @gol
--floop-block -floop-interchange -floop-strip-mine @gol
+-floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize @gol
-floop-parallelize-all -flto -flto-compression-level @gol
-flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
-fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
@@ -7367,6 +7367,12 @@ GIMPLE -> GRAPHITE -> GIMPLE transformation. Some
minimal optimizations
are also performed by the code generator CLooG, like index splitting and
dead code elimination in loops.
+@item -floop-nest-optimize
+@opindex floop-nest-optimize
+Enable the ISL based loop nest optimizer. This is a generic loop nest
+optimizer based on the Pluto optimization algorithms. It calculates a loop
+structure optimized for data-locality and parallelism.
+
What about adding "(experimental)" here?
+
+static isl_union_map *
+scop_get_dependences (scop_p scop ATTRIBUTE_UNUSED)
The ATTRIBUTE_UNUSED is not needd here.
Cheers
Tobi