On Thu, 10 Nov 2016, Christophe Lyon wrote:

> On 10 November 2016 at 09:34, Richard Biener <rguent...@suse.de> wrote:
> > On Wed, 9 Nov 2016, Christophe Lyon wrote:
> >
> >> On 9 November 2016 at 09:36, Bin.Cheng <amker.ch...@gmail.com> wrote:
> >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener <rguent...@suse.de> wrote:
> >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote:
> >> >>
> >> >>> Hi Richard,
> >> >>>
> >> >>>
> >> >>> On 7 November 2016 at 09:01, Richard Biener <rguent...@suse.de> wrote:
> >> >>> >
> >> >>> > The following fixes an oversight when computing alignment in the
> >> >>> > vectorizer.
> >> >>> >
> >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to 
> >> >>> > trunk.
> >> >>> >
> >> >>> > Richard.
> >> >>> >
> >> >>> > 2016-11-07  Richard Biener  <rguent...@suse.de>
> >> >>> >
> >> >>> >         PR tree-optimization/78189
> >> >>> >         * tree-vect-data-refs.c (vect_compute_data_ref_alignment): 
> >> >>> > Fix
> >> >>> >         alignment computation.
> >> >>> >
> >> >>> >         * g++.dg/torture/pr78189.C: New testcase.
> >> >>> >
> >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
> >> >>> > ===================================================================
> >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C      (revision 0)
> >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C      (working copy)
> >> >>> > @@ -0,0 +1,41 @@
> >> >>> > +/* { dg-do run } */
> >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize 
> >> >>> > -fno-vect-cost-model" } */
> >> >>> > +
> >> >>> > +#include <cstddef>
> >> >>> > +
> >> >>> > +struct A
> >> >>> > +{
> >> >>> > +  void * a;
> >> >>> > +  void * b;
> >> >>> > +};
> >> >>> > +
> >> >>> > +struct alignas(16) B
> >> >>> > +{
> >> >>> > +  void * pad;
> >> >>> > +  void * misaligned;
> >> >>> > +  void * pad2;
> >> >>> > +
> >> >>> > +  A a;
> >> >>> > +
> >> >>> > +  void Null();
> >> >>> > +};
> >> >>> > +
> >> >>> > +void B::Null()
> >> >>> > +{
> >> >>> > +  a.a = nullptr;
> >> >>> > +  a.b = nullptr;
> >> >>> > +}
> >> >>> > +
> >> >>> > +void __attribute__((noinline,noclone))
> >> >>> > +NullB(void * misalignedPtr)
> >> >>> > +{
> >> >>> > +  B* b = reinterpret_cast<B*>(reinterpret_cast<char 
> >> >>> > *>(misalignedPtr) - offsetof(B, misaligned));
> >> >>> > +  b->Null();
> >> >>> > +}
> >> >>> > +
> >> >>> > +int main()
> >> >>> > +{
> >> >>> > +  B b;
> >> >>> > +  NullB(&b.misaligned);
> >> >>> > +  return 0;
> >> >>> > +}
> >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> >>> > index 9346cfe..b03cb1e 100644
> >> >>> > --- gcc/tree-vect-data-refs.c
> >> >>> > +++ gcc/tree-vect-data-refs.c
> >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
> >> >>> > data_reference *dr)
> >> >>> >    base = ref;
> >> >>> >    while (handled_component_p (base))
> >> >>> >      base = TREE_OPERAND (base, 0);
> >> >>> > +  unsigned int base_alignment;
> >> >>> > +  unsigned HOST_WIDE_INT base_bitpos;
> >> >>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
> >> >>> > +  /* As data-ref analysis strips the MEM_REF down to its base 
> >> >>> > operand
> >> >>> > +     to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have 
> >> >>> > to
> >> >>> > +     adjust things to make base_alignment valid as the alignment of
> >> >>> > +     DR_BASE_ADDRESS.  */
> >> >>> >    if (TREE_CODE (base) == MEM_REF)
> >> >>> > -    base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
> >> >>> > -                  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 
> >> >>> > 1)), 0));
> >> >>> > -  unsigned int base_alignment = get_object_alignment (base);
> >> >>> > +    {
> >> >>> > +      base_bitpos -= mem_ref_offset (base).to_short_addr () * 
> >> >>> > BITS_PER_UNIT;
> >> >>> > +      base_bitpos &= (base_alignment - 1);
> >> >>> > +    }
> >> >>> > +  if (base_bitpos != 0)
> >> >>> > +    base_alignment = base_bitpos & -base_bitpos;
> >> >>> > +  /* Also look at the alignment of the base address DR analysis
> >> >>> > +     computed.  */
> >> >>> > +  unsigned int base_addr_alignment = get_pointer_alignment 
> >> >>> > (base_addr);
> >> >>> > +  if (base_addr_alignment > base_alignment)
> >> >>> > +    base_alignment = base_addr_alignment;
> >> >>> >
> >> >>> >    if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
> >> >>> >      DR_VECT_AUX (dr)->base_element_aligned = true;
> >> >>>
> >> >>> Since you committed this patch (r241892), I'm seeing execution 
> >> >>> failures:
> >> >>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
> >> >>>   gcc.dg/vect/pr40074.c execution test
> >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
> >> >>> --with-fpu=neon-fp16
> >> >>> (using qemu as simulator)
> >> >>
> >> >> The difference is that we now vectorize the testcase with versioning
> >> >> for alignment (but it should never execute the vectorized variant).
> >> >> I need arm peoples help to understand what is wrong.
> >> > Hi All,
> >> > I will look at it.
> >> >
> >>
> >> Hi,
> >>
> >> This is causing new regressions on armeb:
> >>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
> >> scan-tree-dump-times vect "vectorized 2 loops" 1
> >>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
> >> "vectorized 2 loops" 1
> >
> > It's actually an improvement as armeb can't do unaligned loads.
> > Before the patch we versioned both loops for alignment (they
> > can't be possibly both aligned by luck).  After the patch we
> > align 'arr' and thus the first loop becomes known-aligned and
> > vectorized while the 2nd one is now known unaligned (and thus
> > not vectorized because we can't do unaligned accesses).
> >
> > I suppose the following will fix it.  Can you test that on
> > some arm variants please?
> >
> > It works ok on x86_64/i?86 so please commit if it fixes the issue
> > for you.
> >
> 
> Well, it works on armeb (the test now passes), but regresses on
> arm-*   :(

You have to help me here, it's not efficient time spent by me
figuring out the right dg-targets here.  I can only suppose
that maybe it uses load-lanes and that's happy with unaligned
accesses but hw_misaling is still set to false.  Otherwise
I can't see what's endian specific here (other than that
-eb is probably still second class citizen in vectorization
feature enablement?)

Please open a bugreport against the testsuite in the meantime.

Richard.

> Christophe
> 
> > Thanks,
> > Richard.
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> > index 52fdcf6..3752600 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> > @@ -71,5 +71,5 @@ int main (void)
> >    return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { 
> > target vect_strided2 } } } */
> > -
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
> > target { vect_strided2 && { ! vect_hw_misalign } } } } } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { 
> > target { vect_strided2 && vect_hw_misalign } } } } */
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to