On Thu, Dec 14, 2017 at 12:12:01AM +0000, Jeff Law wrote:
> On 11/17/2017 08:13 AM, Richard Sandiford wrote:
> > This patch adds support for aligning vectors by using a partial
> > first iteration.  E.g. if the start pointer is 3 elements beyond
> > an aligned address, the first iteration will have a mask in which
> > the first three elements are false.
> > 
> > On SVE, the optimisation is only useful for vector-length-specific
> > code.  Vector-length-agnostic code doesn't try to align vectors
> > since the vector length might not be a power of 2.
> > 
> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> > and powerpc64le-linux-gnu.  OK to install?
> > 
> > Richard
> > 
> > 
> > 2017-11-17  Richard Sandiford  <richard.sandif...@linaro.org>
> >         Alan Hayward  <alan.hayw...@arm.com>
> >         David Sherwood  <david.sherw...@arm.com>
> > 
> > gcc/
> >     * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
> >     (LOOP_VINFO_MASK_SKIP_NITERS): New macro.
> >     (vect_use_loop_mask_for_alignment_p): New function.
> >     (vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
> >     * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
> >     niters_skip argument.  Make sure that the first niters_skip elements
> >     of the first iteration are inactive.
> >     (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
> >     Update call to vect_set_loop_masks_directly.
> >     (get_misalign_in_elems): New function, split out from...
> >     (vect_gen_prolog_loop_niters): ...here.
> >     (vect_update_init_of_dr): Take a code argument that specifies whether
> >     the adjustment should be added or subtracted.
> >     (vect_update_init_of_drs): Likewise.
> >     (vect_prepare_for_masked_peels): New function.
> >     (vect_do_peeling): Skip prologue peeling if we're using a mask
> >     instead.  Update call to vect_update_inits_of_drs.
> >     * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> >     mask_skip_niters.
> >     (vect_analyze_loop_2): Allow fully-masked loops with peeling for
> >     alignment.  Do not include the number of peeled iterations in
> >     the minimum threshold in that case.
> >     (vectorizable_induction): Adjust the start value down by
> >     LOOP_VINFO_MASK_SKIP_NITERS iterations.
> >     (vect_transform_loop): Call vect_prepare_for_masked_peels.
> >     Take the number of skipped iterations into account when calculating
> >     the loop bounds.
> >     * tree-vect-stmts.c (vect_gen_while_not): New function.
> OK.
> jeff

The AArch64 tests are OK, but:

> > Index: gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c
> > ===================================================================
> > --- /dev/null       2017-11-14 14:28:07.424493901 +0000
> > +++ gcc/testsuite/gcc.target/aarch64/sve_peel_ind_2_run.c   2017-11-17 
> > 15:11:51.121849349 +0000
> > @@ -0,0 +1,18 @@
> > +/* { dg-do run { target aarch64_sve_hw } } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx" } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -mtune=thunderx 
> > -msve-vector-bits=256" { target aarch64_sve256_hw } } */
> > +

I'd put the comment from sve_peel_ind_2.c as to why we have
the -mtune=thunderx here too.

Thanks,
James

Reply via email to