On 18 November 2016 at 16:54, Christophe Lyon <christophe.l...@linaro.org> wrote: > On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> It is very strange that this test failed on arm, since it requires >> target avx2 to check vectorizer dumps: >> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { >> target avx2_runtime } } } */ >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ >> >> Could you please clarify what is the reason of the failure? > > It's not the scan-dumps that fail, but the execution. > The test calls abort() for some reason. > > It will take me a while to rebuild the test manually in the right > debug environment to provide you with more traces. > > Sorry for the delay... This problem is not directly related to your patch.
The tests in gcc.dg/vect are compiled with -mfpu=neon -mfloat-abi=softfp -march=armv7-a and thus cannot be executed on older versions of the architecture. This is another instance of what I discussed with Jakub several months ago: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00666.html but the thread died. Basically, check_vect_support_and_set_flags sets set dg-do-what-default compile, but some tests in gcc.dg/vect have dg-do run hardcoded. Jakub was not happy with my patch that was removing all these dg-do run directives :-) Christophe > >> >> Thanks. >> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.l...@linaro.org>: >>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Hi All, >>>> >>>> Here is patch for non-masked epilogue vectoriziation. >>>> >>>> Bootstrap and regression testing did not show any new failures. >>>> >>>> Is it OK for trunk? >>>> >>>> Thanks. >>>> Changelog: >>>> >>>> 2016-11-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>> >>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >>>> * tree-if-conv.c (tree_if_conversion): Make public. >>>> * * tree-if-conv.h: New file. >>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >>>> dynamic alias checks for epilogues. >>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >>>> * tree-vect-loop.c: include tree-if-conv.h. >>>> (new_loop_vec_info): Add zeroing orig_loop_info field. >>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >>>> using passed argument. >>>> (vect_transform_loop): Check if created epilogue should be returned >>>> for further vectorization with less vf. If-convert epilogue if >>>> required. Print vectorization success for epilogue. >>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >>>> if it is required, pass loop_vinfo produced during vectorization of >>>> loop body to vect_analyze_loop. >>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field >>>> orig_loop_info. >>>> (LOOP_VINFO_ORIG_LOOP_INFO): New. >>>> (LOOP_VINFO_EPILOGUE_P): New. >>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >>>> (vect_do_peeling): Change prototype to return epilogue. >>>> (vect_analyze_loop): Add argument of loop_vec_info type. >>>> (vect_transform_loop): Return created loop. >>>> >>>> gcc/testsuite/ >>>> >>>> * lib/target-supports.exp (check_avx2_hw_available): New. >>>> (check_effective_target_avx2_runtime): New. >>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >>>> >>> >>> Hi, >>> >>> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >>> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/vect-tail-nomask-1.c execution test >>> >>> It does pass on the same target if configured --with-cpu=cortex-a9. >>> >>> Christophe >>> >>> >>> >>>> >>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >>>>> <ysrum...@gmail.com> wrote: >>>>>>Richard, >>>>>> >>>>>>I checked one of the tests designed for epilogue vectorization using >>>>>>patches 1 - 3 and found out that build compiler performs vectorization >>>>>>of epilogues with --param vect-epilogues-nomask=1 passed: >>>>>> >>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>>>>t1.new-nomask.s -fdump-tree-vect-details >>>>>>$ grep VECTORIZED -c t1.c.156t.vect >>>>>>4 >>>>>> Without param only 2 loops are vectorized. >>>>>> >>>>>>Should I simply add a part of tests related to this feature or I must >>>>>>delete all not necessary changes also? >>>>> >>>>> Please remove all not necessary changes. >>>>> >>>>> Richard. >>>>> >>>>>>Thanks. >>>>>>Yuri. >>>>>> >>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >>>>>>> >>>>>>>> Richard, >>>>>>>> >>>>>>>> In my previous patch I forgot to remove couple lines related to aux >>>>>>field. >>>>>>>> Here is the correct updated patch. >>>>>>> >>>>>>> Yeah, I noticed. This patch would be ok for trunk (together with >>>>>>> necessary parts from 1 and 2) if all not required parts are removed >>>>>>> (and you'd add the testcases covering non-masked tail vect). >>>>>>> >>>>>>> Thus, can you please produce a single complete patch containing only >>>>>>> non-masked epilogue vectoriziation? >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> Thanks. >>>>>>>> Yuri. >>>>>>>> >>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>>>>>>> > >>>>>>>> >> Richard, >>>>>>>> >> >>>>>>>> >> I prepare updated 3 patch with passing additional argument to >>>>>>>> >> vect_analyze_loop as you proposed (untested). >>>>>>>> >> >>>>>>>> >> You wrote: >>>>>>>> >> tw, I wonder if you can produce a single patch containing just >>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out >>>>>>>> >> changes only needed by later patches? >>>>>>>> >> >>>>>>>> >> Did you mean that I exclude all support for vectorization >>>>>>epilogues, >>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes >>>>>>>> >> like >>>>>>>> >> >>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >>>>>>>> >> index 11863af..32011c1 100644 >>>>>>>> >> --- a/gcc/tree-vect-loop.c >>>>>>>> >> +++ b/gcc/tree-vect-loop.c >>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>>>>>>> >> LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>>>>>>> >> LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>>>>>>> >> LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >>>>>>>> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >>>>>>>> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >>>>>>>> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >>>>>>>> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >>>>>>>> >> + LOOP_VINFO_NEED_MASKING (res) = false; >>>>>>>> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >>>>>>>> > >>>>>>>> > Yes. >>>>>>>> > >>>>>>>> >> Did you mean also that new combined patch must be working patch, >>>>>>i.e. >>>>>>>> >> can be integrated without other patches? >>>>>>>> > >>>>>>>> > Yes. >>>>>>>> > >>>>>>>> >> Could you please look at updated patch? >>>>>>>> > >>>>>>>> > Will do. >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > Richard. >>>>>>>> > >>>>>>>> >> Thanks. >>>>>>>> >> Yuri. >>>>>>>> >> >>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >>>>>>>> >> > >>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >>>>>>>> >> >> >>>>>>>> >> >> > Richard, >>>>>>>> >> >> > >>>>>>>> >> >> > Here is updated 3 patch. >>>>>>>> >> >> > >>>>>>>> >> >> > I checked that all new tests related to epilogue >>>>>>vectorization passed with it. >>>>>>>> >> >> > >>>>>>>> >> >> > Your comments will be appreciated. >>>>>>>> >> >> >>>>>>>> >> >> A lot better now. Instead of the ->aux dance I now prefer to >>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that >>>>>>>> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >>>>>>>> >> >> original vectorization factor? So we can pass down an >>>>>>(optional) >>>>>>>> >> >> forced vectorization factor as well? >>>>>>>> >> > >>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just >>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out >>>>>>>> >> > changes only needed by later patches? >>>>>>>> >> > >>>>>>>> >> > Thanks, >>>>>>>> >> > Richard. >>>>>>>> >> > >>>>>>>> >> >> Richard. >>>>>>>> >> >> >>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener >>>>>><rguent...@suse.de>: >>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >>>>>>>> >> >> > > >>>>>>>> >> >> > >> Hi Richard, >>>>>>>> >> >> > >> >>>>>>>> >> >> > >> I did not understand your last remark: >>>>>>>> >> >> > >> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>>>>> >> >> > >> > && dump_enabled_p ()) >>>>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>>>>vect_location, >>>>>>>> >> >> > >> > "loop vectorized\n"); >>>>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>>>>> >> >> > >> > num_vectorized_loops++; >>>>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>>>>it to be unrolled >>>>>>>> >> >> > >> > etc. */ >>>>>>>> >> >> > >> > loop->force_vectorize = false; >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>>>>it easier >>>>>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>>>>in dumps >>>>>>>> >> >> > >> > + put new loop as the next loop to process. >>>>>>*/ >>>>>>>> >> >> > >> > + if (new_loop) >>>>>>>> >> >> > >> > + { >>>>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>>>>> >> >> > >> > + } >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>>>>new_loop) >>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also >>>>>>perform >>>>>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>>>>vectorization >>>>>>>> >> >> > >> > separately that would be great. >>>>>>>> >> >> > >> >>>>>>>> >> >> > >> Could you please clarify your proposal. >>>>>>>> >> >> > > >>>>>>>> >> >> > > When a loop was vectorized set things up to immediately >>>>>>vectorize >>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and >>>>>>avoiding >>>>>>>> >> >> > > the re-use of ->aux. >>>>>>>> >> >> > > >>>>>>>> >> >> > > Richard. >>>>>>>> >> >> > > >>>>>>>> >> >> > >> Thanks. >>>>>>>> >> >> > >> Yuri. >>>>>>>> >> >> > >> >>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener >>>>>><rguent...@suse.de>: >>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> >> Hi All, >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review >>>>>>which support >>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low >>>>>>trip count. We >>>>>>>> >> >> > >> >> assume that the only patch - >>>>>>vec-tails-07-combine-tail.patch - was not >>>>>>>> >> >> > >> >> approved by Jeff. >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> >> I did re-base of all patches and performed >>>>>>bootstrapping and >>>>>>>> >> >> > >> >> regression testing that did not show any new failures. >>>>>>Also all >>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have >>>>>>been changed >>>>>>>> >> >> > >> >> accordingly. >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> >> Is it OK for trunk? >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > I would have prefered that the series up to >>>>>>-03-nomask-tails would >>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but >>>>>>unfortunately >>>>>>>> >> >> > >> > the patchset is oddly separated. >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > I have a comment on that part nevertheless: >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment >>>>>>(loop_vec_info >>>>>>>> >> >> > >> > loop_vinfo) >>>>>>>> >> >> > >> > /* Check if we can possibly peel the loop. */ >>>>>>>> >> >> > >> > if (!vect_can_advance_ivs_p (loop_vinfo) >>>>>>>> >> >> > >> > || !slpeel_can_duplicate_loop_p (loop, >>>>>>single_exit (loop)) >>>>>>>> >> >> > >> > - || loop->inner) >>>>>>>> >> >> > >> > + || loop->inner >>>>>>>> >> >> > >> > + /* Required peeling was performed in prologue >>>>>>and >>>>>>>> >> >> > >> > + is not required for epilogue. */ >>>>>>>> >> >> > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) >>>>>>>> >> >> > >> > do_peeling = false; >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > if (do_peeling >>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment >>>>>>(loop_vec_info >>>>>>>> >> >> > >> > loop_vinfo) >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > do_versioning = >>>>>>>> >> >> > >> > optimize_loop_nest_for_speed_p (loop) >>>>>>>> >> >> > >> > - && (!loop->inner); /* FORNOW */ >>>>>>>> >> >> > >> > + && (!loop->inner) /* FORNOW */ >>>>>>>> >> >> > >> > + /* Required versioning was performed for the >>>>>>>> >> >> > >> > + original loop and is not required for >>>>>>epilogue. */ >>>>>>>> >> >> > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > if (do_versioning) >>>>>>>> >> >> > >> > { >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > please do that check in the single caller of this >>>>>>function. >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I >>>>>>believe that simply >>>>>>>> >> >> > >> > passing down info from the processed parent would be >>>>>>_much_ cleaner. >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>>>>> >> >> > >> > && dump_enabled_p ()) >>>>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>>>>vect_location, >>>>>>>> >> >> > >> > "loop vectorized\n"); >>>>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>>>>> >> >> > >> > num_vectorized_loops++; >>>>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>>>>it to be unrolled >>>>>>>> >> >> > >> > etc. */ >>>>>>>> >> >> > >> > loop->force_vectorize = false; >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>>>>it easier >>>>>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>>>>in dumps >>>>>>>> >> >> > >> > + put new loop as the next loop to process. >>>>>>*/ >>>>>>>> >> >> > >> > + if (new_loop) >>>>>>>> >> >> > >> > + { >>>>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>>>>> >> >> > >> > + } >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>>>>new_loop) >>>>>>>> >> >> > >> > function which will set up stuff properly (and also >>>>>>perform >>>>>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>>>>vectorization >>>>>>>> >> >> > >> > separately that would be great. >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and >>>>>>question its >>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main >>>>>>vector loop). >>>>>>>> >> >> > >> > But it has already been approved ... oh well. >>>>>>>> >> >> > >> > >>>>>>>> >> >> > >> > Thanks, >>>>>>>> >> >> > >> > Richard. >>>>>>>> >> >> > >> >>>>>>>> >> >> > >> >>>>>>>> >> >> > > >>>>>>>> >> >> > > -- >>>>>>>> >> >> > > Richard Biener <rguent...@suse.de> >>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, >>>>>>Graham Norton, HRB 21284 (AG Nuernberg) >>>>>>>> >> >> > >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> > >>>>>>>> >> > -- >>>>>>>> >> > Richard Biener <rguent...@suse.de> >>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>>>Norton, HRB 21284 (AG Nuernberg) >>>>>>>> >> >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Richard Biener <rguent...@suse.de> >>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>>>Norton, HRB 21284 (AG Nuernberg) >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Richard Biener <rguent...@suse.de> >>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>>>Norton, HRB 21284 (AG Nuernberg) >>>>> >>>>>