It is very strange that this test failed on arm, since it requires target avx2 to check vectorizer dumps:
/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target avx2_runtime } } } */ /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ Could you please clarify what is the reason of the failure? Thanks. 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.l...@linaro.org>: > On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi All, >> >> Here is patch for non-masked epilogue vectoriziation. >> >> Bootstrap and regression testing did not show any new failures. >> >> Is it OK for trunk? >> >> Thanks. >> Changelog: >> >> 2016-11-15 Yuri Rumyantsev <ysrum...@gmail.com> >> >> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >> * tree-if-conv.c (tree_if_conversion): Make public. >> * * tree-if-conv.h: New file. >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >> dynamic alias checks for epilogues. >> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >> * tree-vect-loop.c: include tree-if-conv.h. >> (new_loop_vec_info): Add zeroing orig_loop_info field. >> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >> using passed argument. >> (vect_transform_loop): Check if created epilogue should be returned >> for further vectorization with less vf. If-convert epilogue if >> required. Print vectorization success for epilogue. >> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >> if it is required, pass loop_vinfo produced during vectorization of >> loop body to vect_analyze_loop. >> * tree-vectorizer.h (struct _loop_vec_info): Add new field >> orig_loop_info. >> (LOOP_VINFO_ORIG_LOOP_INFO): New. >> (LOOP_VINFO_EPILOGUE_P): New. >> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >> (vect_do_peeling): Change prototype to return epilogue. >> (vect_analyze_loop): Add argument of loop_vec_info type. >> (vect_transform_loop): Return created loop. >> >> gcc/testsuite/ >> >> * lib/target-supports.exp (check_avx2_hw_available): New. >> (check_effective_target_avx2_runtime): New. >> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >> > > Hi, > > This new test fails on arm-none-eabi (using default cpu/fpu/mode): > gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-tail-nomask-1.c execution test > > It does pass on the same target if configured --with-cpu=cortex-a9. > > Christophe > > > >> >> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguent...@suse.de>: >>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >>> <ysrum...@gmail.com> wrote: >>>>Richard, >>>> >>>>I checked one of the tests designed for epilogue vectorization using >>>>patches 1 - 3 and found out that build compiler performs vectorization >>>>of epilogues with --param vect-epilogues-nomask=1 passed: >>>> >>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>>t1.new-nomask.s -fdump-tree-vect-details >>>>$ grep VECTORIZED -c t1.c.156t.vect >>>>4 >>>> Without param only 2 loops are vectorized. >>>> >>>>Should I simply add a part of tests related to this feature or I must >>>>delete all not necessary changes also? >>> >>> Please remove all not necessary changes. >>> >>> Richard. >>> >>>>Thanks. >>>>Yuri. >>>> >>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >>>>> >>>>>> Richard, >>>>>> >>>>>> In my previous patch I forgot to remove couple lines related to aux >>>>field. >>>>>> Here is the correct updated patch. >>>>> >>>>> Yeah, I noticed. This patch would be ok for trunk (together with >>>>> necessary parts from 1 and 2) if all not required parts are removed >>>>> (and you'd add the testcases covering non-masked tail vect). >>>>> >>>>> Thus, can you please produce a single complete patch containing only >>>>> non-masked epilogue vectoriziation? >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> Thanks. >>>>>> Yuri. >>>>>> >>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>>>>> > >>>>>> >> Richard, >>>>>> >> >>>>>> >> I prepare updated 3 patch with passing additional argument to >>>>>> >> vect_analyze_loop as you proposed (untested). >>>>>> >> >>>>>> >> You wrote: >>>>>> >> tw, I wonder if you can produce a single patch containing just >>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out >>>>>> >> changes only needed by later patches? >>>>>> >> >>>>>> >> Did you mean that I exclude all support for vectorization >>>>epilogues, >>>>>> >> i.e. exclude from 2-nd patch all non-related changes >>>>>> >> like >>>>>> >> >>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >>>>>> >> index 11863af..32011c1 100644 >>>>>> >> --- a/gcc/tree-vect-loop.c >>>>>> >> +++ b/gcc/tree-vect-loop.c >>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>>>>> >> LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>>>>> >> LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>>>>> >> LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >>>>>> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >>>>>> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >>>>>> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >>>>>> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >>>>>> >> + LOOP_VINFO_NEED_MASKING (res) = false; >>>>>> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >>>>>> > >>>>>> > Yes. >>>>>> > >>>>>> >> Did you mean also that new combined patch must be working patch, >>>>i.e. >>>>>> >> can be integrated without other patches? >>>>>> > >>>>>> > Yes. >>>>>> > >>>>>> >> Could you please look at updated patch? >>>>>> > >>>>>> > Will do. >>>>>> > >>>>>> > Thanks, >>>>>> > Richard. >>>>>> > >>>>>> >> Thanks. >>>>>> >> Yuri. >>>>>> >> >>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >>>>>> >> > >>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >>>>>> >> >> >>>>>> >> >> > Richard, >>>>>> >> >> > >>>>>> >> >> > Here is updated 3 patch. >>>>>> >> >> > >>>>>> >> >> > I checked that all new tests related to epilogue >>>>vectorization passed with it. >>>>>> >> >> > >>>>>> >> >> > Your comments will be appreciated. >>>>>> >> >> >>>>>> >> >> A lot better now. Instead of the ->aux dance I now prefer to >>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that >>>>>> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >>>>>> >> >> original vectorization factor? So we can pass down an >>>>(optional) >>>>>> >> >> forced vectorization factor as well? >>>>>> >> > >>>>>> >> > Btw, I wonder if you can produce a single patch containing just >>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out >>>>>> >> > changes only needed by later patches? >>>>>> >> > >>>>>> >> > Thanks, >>>>>> >> > Richard. >>>>>> >> > >>>>>> >> >> Richard. >>>>>> >> >> >>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener >>>><rguent...@suse.de>: >>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >>>>>> >> >> > > >>>>>> >> >> > >> Hi Richard, >>>>>> >> >> > >> >>>>>> >> >> > >> I did not understand your last remark: >>>>>> >> >> > >> >>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>>> >> >> > >> > >>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>>> >> >> > >> > && dump_enabled_p ()) >>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>>vect_location, >>>>>> >> >> > >> > "loop vectorized\n"); >>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>>> >> >> > >> > num_vectorized_loops++; >>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>>it to be unrolled >>>>>> >> >> > >> > etc. */ >>>>>> >> >> > >> > loop->force_vectorize = false; >>>>>> >> >> > >> > >>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>>it easier >>>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>>in dumps >>>>>> >> >> > >> > + put new loop as the next loop to process. >>>>*/ >>>>>> >> >> > >> > + if (new_loop) >>>>>> >> >> > >> > + { >>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>>> >> >> > >> > + } >>>>>> >> >> > >> > >>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>>new_loop) >>>>>> >> >> > >> f> unction which will set up stuff properly (and also >>>>perform >>>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>>> >> >> > >> > >>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>>vectorization >>>>>> >> >> > >> > separately that would be great. >>>>>> >> >> > >> >>>>>> >> >> > >> Could you please clarify your proposal. >>>>>> >> >> > > >>>>>> >> >> > > When a loop was vectorized set things up to immediately >>>>vectorize >>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and >>>>avoiding >>>>>> >> >> > > the re-use of ->aux. >>>>>> >> >> > > >>>>>> >> >> > > Richard. >>>>>> >> >> > > >>>>>> >> >> > >> Thanks. >>>>>> >> >> > >> Yuri. >>>>>> >> >> > >> >>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener >>>><rguent...@suse.de>: >>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >>>>>> >> >> > >> > >>>>>> >> >> > >> >> Hi All, >>>>>> >> >> > >> >> >>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review >>>>which support >>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low >>>>trip count. We >>>>>> >> >> > >> >> assume that the only patch - >>>>vec-tails-07-combine-tail.patch - was not >>>>>> >> >> > >> >> approved by Jeff. >>>>>> >> >> > >> >> >>>>>> >> >> > >> >> I did re-base of all patches and performed >>>>bootstrapping and >>>>>> >> >> > >> >> regression testing that did not show any new failures. >>>>Also all >>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have >>>>been changed >>>>>> >> >> > >> >> accordingly. >>>>>> >> >> > >> >> >>>>>> >> >> > >> >> Is it OK for trunk? >>>>>> >> >> > >> > >>>>>> >> >> > >> > I would have prefered that the series up to >>>>-03-nomask-tails would >>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but >>>>unfortunately >>>>>> >> >> > >> > the patchset is oddly separated. >>>>>> >> >> > >> > >>>>>> >> >> > >> > I have a comment on that part nevertheless: >>>>>> >> >> > >> > >>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment >>>>(loop_vec_info >>>>>> >> >> > >> > loop_vinfo) >>>>>> >> >> > >> > /* Check if we can possibly peel the loop. */ >>>>>> >> >> > >> > if (!vect_can_advance_ivs_p (loop_vinfo) >>>>>> >> >> > >> > || !slpeel_can_duplicate_loop_p (loop, >>>>single_exit (loop)) >>>>>> >> >> > >> > - || loop->inner) >>>>>> >> >> > >> > + || loop->inner >>>>>> >> >> > >> > + /* Required peeling was performed in prologue >>>>and >>>>>> >> >> > >> > + is not required for epilogue. */ >>>>>> >> >> > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) >>>>>> >> >> > >> > do_peeling = false; >>>>>> >> >> > >> > >>>>>> >> >> > >> > if (do_peeling >>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment >>>>(loop_vec_info >>>>>> >> >> > >> > loop_vinfo) >>>>>> >> >> > >> > >>>>>> >> >> > >> > do_versioning = >>>>>> >> >> > >> > optimize_loop_nest_for_speed_p (loop) >>>>>> >> >> > >> > - && (!loop->inner); /* FORNOW */ >>>>>> >> >> > >> > + && (!loop->inner) /* FORNOW */ >>>>>> >> >> > >> > + /* Required versioning was performed for the >>>>>> >> >> > >> > + original loop and is not required for >>>>epilogue. */ >>>>>> >> >> > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); >>>>>> >> >> > >> > >>>>>> >> >> > >> > if (do_versioning) >>>>>> >> >> > >> > { >>>>>> >> >> > >> > >>>>>> >> >> > >> > please do that check in the single caller of this >>>>function. >>>>>> >> >> > >> > >>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I >>>>believe that simply >>>>>> >> >> > >> > passing down info from the processed parent would be >>>>_much_ cleaner. >>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>>> >> >> > >> > >>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>>> >> >> > >> > && dump_enabled_p ()) >>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>>vect_location, >>>>>> >> >> > >> > "loop vectorized\n"); >>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>>> >> >> > >> > num_vectorized_loops++; >>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>>it to be unrolled >>>>>> >> >> > >> > etc. */ >>>>>> >> >> > >> > loop->force_vectorize = false; >>>>>> >> >> > >> > >>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>>it easier >>>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>>in dumps >>>>>> >> >> > >> > + put new loop as the next loop to process. >>>>*/ >>>>>> >> >> > >> > + if (new_loop) >>>>>> >> >> > >> > + { >>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>>> >> >> > >> > + } >>>>>> >> >> > >> > >>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>>new_loop) >>>>>> >> >> > >> > function which will set up stuff properly (and also >>>>perform >>>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>>> >> >> > >> > >>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>>vectorization >>>>>> >> >> > >> > separately that would be great. >>>>>> >> >> > >> > >>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and >>>>question its >>>>>> >> >> > >> > usability (esp. merging the epilogue with the main >>>>vector loop). >>>>>> >> >> > >> > But it has already been approved ... oh well. >>>>>> >> >> > >> > >>>>>> >> >> > >> > Thanks, >>>>>> >> >> > >> > Richard. >>>>>> >> >> > >> >>>>>> >> >> > >> >>>>>> >> >> > > >>>>>> >> >> > > -- >>>>>> >> >> > > Richard Biener <rguent...@suse.de> >>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, >>>>Graham Norton, HRB 21284 (AG Nuernberg) >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >> >>>>>> >> > >>>>>> >> > -- >>>>>> >> > Richard Biener <rguent...@suse.de> >>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>Norton, HRB 21284 (AG Nuernberg) >>>>>> >> >>>>>> > >>>>>> > -- >>>>>> > Richard Biener <rguent...@suse.de> >>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>Norton, HRB 21284 (AG Nuernberg) >>>>>> >>>>> >>>>> -- >>>>> Richard Biener <rguent...@suse.de> >>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>Norton, HRB 21284 (AG Nuernberg) >>> >>>