On Thu, Dec 14, 2017 at 12:12:01AM +0000, Jeff Law wrote: > On 11/17/2017 08:13 AM, Richard Sandiford wrote: > > This patch adds support for aligning vectors by using a partial > > first iteration. E.g. if the start pointer is 3 elements beyond > > an aligned address, the first iteration will have a mask in which > > the first three elements are false. > > > > On SVE, the optimisation is only useful for vector-length-specific > > code. Vector-length-agnostic code doesn't try to align vectors > > since the vector length might not be a power of 2. > > > > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu > > and powerpc64le-linux-gnu. OK to install? > > > > Richard > > > > > > 2017-11-17 Richard Sandiford <richard.sandif...@linaro.org> > > Alan Hayward <alan.hayw...@arm.com> > > David Sherwood <david.sherw...@arm.com> > > > > gcc/ > > * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field. > > (LOOP_VINFO_MASK_SKIP_NITERS): New macro. > > (vect_use_loop_mask_for_alignment_p): New function. > > (vect_prepare_for_masked_peels, vect_gen_while_not): Declare. > > * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an > > niters_skip argument. Make sure that the first niters_skip elements > > of the first iteration are inactive. > > (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS. > > Update call to vect_set_loop_masks_directly. > > (get_misalign_in_elems): New function, split out from... > > (vect_gen_prolog_loop_niters): ...here. > > (vect_update_init_of_dr): Take a code argument that specifies whether > > the adjustment should be added or subtracted. > > (vect_update_init_of_drs): Likewise. > > (vect_prepare_for_masked_peels): New function. > > (vect_do_peeling): Skip prologue peeling if we're using a mask > > instead. Update call to vect_update_inits_of_drs. > > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize > > mask_skip_niters. > > (vect_analyze_loop_2): Allow fully-masked loops with peeling for > > alignment. Do not include the number of peeled iterations in > > the minimum threshold in that case. > > (vectorizable_induction): Adjust the start value down by > > LOOP_VINFO_MASK_SKIP_NITERS iterations. > > (vect_transform_loop): Call vect_prepare_for_masked_peels. > > Take the number of skipped iterations into account when calculating > > the loop bounds. > > * tree-vect-stmts.c (vect_gen_while_not): New function. > OK. > jeff
The AArch64 tests are OK, but: > > Index: gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c > > =================================================================== > > --- /dev/null 2017-11-14 14:28:07.424493901 +0000 > > +++ gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c 2017-11-17 > > 15:11:51.121849349 +0000 > > @@ -0,0 +1,18 @@ > > +/* { dg-do run { target aarch64_sve_hw } } */ > > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx" } */ > > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx > > -msve-vector-bits=256" { target aarch64_sve256_hw } } */ > > + I'd put the comment from sve_peel_ind_2.c as to why we have the -mtune=thunderx here too. Thanks, James