On Wed, Sep 14, 2016 at 5:43 PM, Jeff Law <l...@redhat.com> wrote: > On 09/14/2016 07:21 AM, Richard Biener wrote: >> >> On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> >>> Hi, >>> This is the main patch improving control flow graph for vectorized loop. >>> It generally rewrites loop peeling stuff in vectorizer. As described in >>> patch, for a typical loop to be vectorized like: >>> >>> preheader: >>> LOOP: >>> header_bb: >>> loop_body >>> if (exit_loop_cond) goto exit_bb >>> else goto header_bb >>> exit_bb: >>> >>> This patch peels prolog and epilog from the loop, adds guards skipping >>> PROLOG and EPILOG for various conditions. As a result, the changed CFG >>> would look like: >>> >>> guard_bb_1: >>> if (prefer_scalar_loop) goto merge_bb_1 >>> else goto guard_bb_2 >>> >>> guard_bb_2: >>> if (skip_prolog) goto merge_bb_2 >>> else goto prolog_preheader >>> >>> prolog_preheader: >>> PROLOG: >>> prolog_header_bb: >>> prolog_body >>> if (exit_prolog_cond) goto prolog_exit_bb >>> else goto prolog_header_bb >>> prolog_exit_bb: >>> >>> merge_bb_2: >>> >>> vector_preheader: >>> VECTOR LOOP: >>> vector_header_bb: >>> vector_body >>> if (exit_vector_cond) goto vector_exit_bb >>> else goto vector_header_bb >>> vector_exit_bb: >>> >>> guard_bb_3: >>> if (skip_epilog) goto merge_bb_3 >>> else goto epilog_preheader >>> >>> merge_bb_1: >>> >>> epilog_preheader: >>> EPILOG: >>> epilog_header_bb: >>> epilog_body >>> if (exit_epilog_cond) goto merge_bb_3 >>> else goto epilog_header_bb >>> >>> merge_bb_3: >>> >>> >>> Note this patch peels prolog and epilog only if it's necessary, as well >>> as adds different guard_conditions/branches. Also the first guard/branch >>> could be further improved by merging it with loop versioning. >>> >>> Before this patch, up to 4 branch instructions need to be executed before >>> the vectorized loop is reached in the worst case, while the number is >>> reduced to 2 with this patch. The patch also does better in compile time >>> analysis to avoid unnecessary peeling/branching. >>> From implementation's point of view, vectorizer needs to update induction >>> variables and iteration bounds along with control flow changes. >>> Unfortunately, it also becomes much harder to follow because slpeel_* >>> functions updates SSA by itself, rather than using update_ssa interface. >>> This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes. >>> This should make the implementation easier to read, and I think it maybe a >>> step forward to replace slpeel_* functions with generic GIMPLE loop copy >>> interfaces as Richard suggested. >> >> >> I've skimmed over the patch and it looks reasonable to me. > > THanks. I was maybe 15% of the way through the main patch. Nothing that > gave me cause for concern, but I wasn't ready to ACK it myself yet. Hi Jeff, Any update on this one? Well, it might conflict with the epilogue vectorization patch set?
Thanks, bin