On 09/06/2016 12:49 PM, Bin Cheng wrote:
Hi,
This is a patch set generating new control flow graph for vectorized loop and
its peeling loops. At the moment, CFG for vecorized loop is complicated and
sub-optimal. Major issues are like:
A) For both prologue and vectorized loop, it generates guard/branch before
loops checking if the following (prologue/vectorized) loop should be skipped.
It also generates guard/branch after loops checking if the next loop
(vectorized/epilogue) loop should be skipped.
B) Depending on how conditional set is supported by targets, it may generates
one additional if-statement (branch) setting the niters for prologue loop.
C) In the worst cases, up to 4 branch instructions need to be executed before
vectorized loop is entered.
D) For loops without enough niters, it checks&executes some (niters_prologue)
iterations with prologue loop; then checks if the rest number of iterations (niters
- niters_prologue) is enough for vectorization; if not, it skips vectorized loop
and continues with epilogue loop. This is bad since vectorized loop won't be
executed at all after all the hassle.
This patch set improves it by merging different checks thus only 2 branch
instructions (could be further reduced in combination with loop versioning) are
executed before vectorized loop; it does better in compile time analysis in
order to avoid prologue/epilogue peeling if possible; it improves code
generation in various ways (live overflow handling, generating short live
ranges). In terms of implementation, it tries to factor SSA updating code out
of CFG changing code, I think this may help future work replacing slpeel_* with
generic GIMPLE loop copier.
So far there are 9 patches in the set, patch [1-5] are small prerequisites for
major change which is done by patch 6. Patch [7-9] are small patches either
address test case or improve code generation. Final bootstrap and test of
patch set ongoing on x86_64 and AArch64. Assume no new failure or will be
fixed, any comments on this?
This is the first patch deleting useless code in tree-vect-loop-manip.c, as
well as fixing obvious code style issue.
Thanks,
bin
2016-09-01 Bin Cheng <bin.ch...@arm.com>
* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code
style issue.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment):
Remove useless code.
Seems obvious to me -- I can't think of any reason why we'd emit a NULL
sequence to the loop preheader edge.
jeff