On 09/14/2016 07:21 AM, Richard Biener wrote:
On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <bin.ch...@arm.com> wrote:
Hi,
This is the main patch improving control flow graph for vectorized loop. It
generally rewrites loop peeling stuff in vectorizer. As described in patch,
for a typical loop to be vectorized like:
preheader:
LOOP:
header_bb:
loop_body
if (exit_loop_cond) goto exit_bb
else goto header_bb
exit_bb:
This patch peels prolog and epilog from the loop, adds guards skipping PROLOG
and EPILOG for various conditions. As a result, the changed CFG would look
like:
guard_bb_1:
if (prefer_scalar_loop) goto merge_bb_1
else goto guard_bb_2
guard_bb_2:
if (skip_prolog) goto merge_bb_2
else goto prolog_preheader
prolog_preheader:
PROLOG:
prolog_header_bb:
prolog_body
if (exit_prolog_cond) goto prolog_exit_bb
else goto prolog_header_bb
prolog_exit_bb:
merge_bb_2:
vector_preheader:
VECTOR LOOP:
vector_header_bb:
vector_body
if (exit_vector_cond) goto vector_exit_bb
else goto vector_header_bb
vector_exit_bb:
guard_bb_3:
if (skip_epilog) goto merge_bb_3
else goto epilog_preheader
merge_bb_1:
epilog_preheader:
EPILOG:
epilog_header_bb:
epilog_body
if (exit_epilog_cond) goto merge_bb_3
else goto epilog_header_bb
merge_bb_3:
Note this patch peels prolog and epilog only if it's necessary, as well as adds
different guard_conditions/branches. Also the first guard/branch could be
further improved by merging it with loop versioning.
Before this patch, up to 4 branch instructions need to be executed before the
vectorized loop is reached in the worst case, while the number is reduced to 2
with this patch. The patch also does better in compile time analysis to avoid
unnecessary peeling/branching.
From implementation's point of view, vectorizer needs to update induction
variables and iteration bounds along with control flow changes. Unfortunately,
it also becomes much harder to follow because slpeel_* functions updates SSA by
itself, rather than using update_ssa interface. This patch tries to factor out
SSA/IV/Niter_bound changes from CFG changes. This should make the
implementation easier to read, and I think it maybe a step forward to replace
slpeel_* functions with generic GIMPLE loop copy interfaces as Richard
suggested.
I've skimmed over the patch and it looks reasonable to me.
THanks. I was maybe 15% of the way through the main patch. Nothing
that gave me cause for concern, but I wasn't ready to ACK it myself yet.
jeff