Richard Guenther wrote:
gfortran.dg/reassoc_4.f, the hottest loop from calculix.
Thanks.
This example is slightly different. Graphite should be able to handle it
with loop fusion rather than pre-unrolling + cse. But I agree that the
unrolling + cse approach also makes sense (and does not depend on the
same legality constraints as loop fusion).
This makes me think of a simple, general criterion to detect cases where
pre-unrolling of inner loop helps further cse and loop optimizations.
The idea is to unroll only when we can see some evidence of array
references that are not presently loop-invariant that would be made
(outer)-loop invariant via full unrolling of some inner loop.
This can be implemented by complementing the current heuristic (or its
complementary enhancements by Honza) with an additional condition, only
enabled when running it with the "i" (inner) flag (which should probably
be renamed if we do implement this...).
The simplest, weakest condition I can think of would be to traverse all
array references in the region enclosed by the loop-to-be-unrolled,
compute the SCEV for each one, instanciate it in the loop's context, and
checking if it only depends on the loop counter, as well as outer loop
counters or parameters.
This condition would a priori pass on the tramp3d and reassoc_4 cases.
Yet it is probably too weak and will still pass on many codes where
unrolling would probably not help at all... and probably harm.
If this is the case, we should consider multiple loops to be unrolled,
and the combined effect of unrolling ALL of these, resulting in complete
instanciation of the array subscripts with constants. This is a very
special case, again satisfied by our two motivating examples. Maybe it
will be too specific and we'll have performance regressions... It
remained to be investigated if we have to go through a stricter
condition than the first, weak one I proposed.
If this is not clear, I can write some pseudo-code to clarify :-).
Albert