On 11 November 2016 at 09:56, Richard Biener <rguent...@suse.de> wrote:
> On Thu, 10 Nov 2016, Christophe Lyon wrote:
>
>> On 10 November 2016 at 09:34, Richard Biener <rguent...@suse.de> wrote:
>> > On Wed, 9 Nov 2016, Christophe Lyon wrote:
>> >
>> >> On 9 November 2016 at 09:36, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener <rguent...@suse.de> 
>> >> > wrote:
>> >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote:
>> >> >>
>> >> >>> Hi Richard,
>> >> >>>
>> >> >>>
>> >> >>> On 7 November 2016 at 09:01, Richard Biener <rguent...@suse.de> wrote:
>> >> >>> >
>> >> >>> > The following fixes an oversight when computing alignment in the
>> >> >>> > vectorizer.
>> >> >>> >
>> >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to 
>> >> >>> > trunk.
>> >> >>> >
>> >> >>> > Richard.
>> >> >>> >
>> >> >>> > 2016-11-07  Richard Biener  <rguent...@suse.de>
>> >> >>> >
>> >> >>> >         PR tree-optimization/78189
>> >> >>> >         * tree-vect-data-refs.c (vect_compute_data_ref_alignment): 
>> >> >>> > Fix
>> >> >>> >         alignment computation.
>> >> >>> >
>> >> >>> >         * g++.dg/torture/pr78189.C: New testcase.
>> >> >>> >
>> >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
>> >> >>> > ===================================================================
>> >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C      (revision 0)
>> >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C      (working copy)
>> >> >>> > @@ -0,0 +1,41 @@
>> >> >>> > +/* { dg-do run } */
>> >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize 
>> >> >>> > -fno-vect-cost-model" } */
>> >> >>> > +
>> >> >>> > +#include <cstddef>
>> >> >>> > +
>> >> >>> > +struct A
>> >> >>> > +{
>> >> >>> > +  void * a;
>> >> >>> > +  void * b;
>> >> >>> > +};
>> >> >>> > +
>> >> >>> > +struct alignas(16) B
>> >> >>> > +{
>> >> >>> > +  void * pad;
>> >> >>> > +  void * misaligned;
>> >> >>> > +  void * pad2;
>> >> >>> > +
>> >> >>> > +  A a;
>> >> >>> > +
>> >> >>> > +  void Null();
>> >> >>> > +};
>> >> >>> > +
>> >> >>> > +void B::Null()
>> >> >>> > +{
>> >> >>> > +  a.a = nullptr;
>> >> >>> > +  a.b = nullptr;
>> >> >>> > +}
>> >> >>> > +
>> >> >>> > +void __attribute__((noinline,noclone))
>> >> >>> > +NullB(void * misalignedPtr)
>> >> >>> > +{
>> >> >>> > +  B* b = reinterpret_cast<B*>(reinterpret_cast<char 
>> >> >>> > *>(misalignedPtr) - offsetof(B, misaligned));
>> >> >>> > +  b->Null();
>> >> >>> > +}
>> >> >>> > +
>> >> >>> > +int main()
>> >> >>> > +{
>> >> >>> > +  B b;
>> >> >>> > +  NullB(&b.misaligned);
>> >> >>> > +  return 0;
>> >> >>> > +}
>> >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >> >>> > index 9346cfe..b03cb1e 100644
>> >> >>> > --- gcc/tree-vect-data-refs.c
>> >> >>> > +++ gcc/tree-vect-data-refs.c
>> >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
>> >> >>> > data_reference *dr)
>> >> >>> >    base = ref;
>> >> >>> >    while (handled_component_p (base))
>> >> >>> >      base = TREE_OPERAND (base, 0);
>> >> >>> > +  unsigned int base_alignment;
>> >> >>> > +  unsigned HOST_WIDE_INT base_bitpos;
>> >> >>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
>> >> >>> > +  /* As data-ref analysis strips the MEM_REF down to its base 
>> >> >>> > operand
>> >> >>> > +     to form DR_BASE_ADDRESS and adds the offset to DR_INIT we 
>> >> >>> > have to
>> >> >>> > +     adjust things to make base_alignment valid as the alignment of
>> >> >>> > +     DR_BASE_ADDRESS.  */
>> >> >>> >    if (TREE_CODE (base) == MEM_REF)
>> >> >>> > -    base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
>> >> >>> > -                  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 
>> >> >>> > 1)), 0));
>> >> >>> > -  unsigned int base_alignment = get_object_alignment (base);
>> >> >>> > +    {
>> >> >>> > +      base_bitpos -= mem_ref_offset (base).to_short_addr () * 
>> >> >>> > BITS_PER_UNIT;
>> >> >>> > +      base_bitpos &= (base_alignment - 1);
>> >> >>> > +    }
>> >> >>> > +  if (base_bitpos != 0)
>> >> >>> > +    base_alignment = base_bitpos & -base_bitpos;
>> >> >>> > +  /* Also look at the alignment of the base address DR analysis
>> >> >>> > +     computed.  */
>> >> >>> > +  unsigned int base_addr_alignment = get_pointer_alignment 
>> >> >>> > (base_addr);
>> >> >>> > +  if (base_addr_alignment > base_alignment)
>> >> >>> > +    base_alignment = base_addr_alignment;
>> >> >>> >
>> >> >>> >    if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
>> >> >>> >      DR_VECT_AUX (dr)->base_element_aligned = true;
>> >> >>>
>> >> >>> Since you committed this patch (r241892), I'm seeing execution 
>> >> >>> failures:
>> >> >>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
>> >> >>>   gcc.dg/vect/pr40074.c execution test
>> >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
>> >> >>> --with-fpu=neon-fp16
>> >> >>> (using qemu as simulator)
>> >> >>
>> >> >> The difference is that we now vectorize the testcase with versioning
>> >> >> for alignment (but it should never execute the vectorized variant).
>> >> >> I need arm peoples help to understand what is wrong.
>> >> > Hi All,
>> >> > I will look at it.
>> >> >
>> >>
>> >> Hi,
>> >>
>> >> This is causing new regressions on armeb:
>> >>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
>> >> scan-tree-dump-times vect "vectorized 2 loops" 1
>> >>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
>> >> "vectorized 2 loops" 1
>> >
>> > It's actually an improvement as armeb can't do unaligned loads.
>> > Before the patch we versioned both loops for alignment (they
>> > can't be possibly both aligned by luck).  After the patch we
>> > align 'arr' and thus the first loop becomes known-aligned and
>> > vectorized while the 2nd one is now known unaligned (and thus
>> > not vectorized because we can't do unaligned accesses).
>> >
>> > I suppose the following will fix it.  Can you test that on
>> > some arm variants please?
>> >
>> > It works ok on x86_64/i?86 so please commit if it fixes the issue
>> > for you.
>> >
>>
>> Well, it works on armeb (the test now passes), but regresses on
>> arm-*   :(
>
> You have to help me here, it's not efficient time spent by me
> figuring out the right dg-targets here.  I can only suppose
> that maybe it uses load-lanes and that's happy with unaligned
> accesses but hw_misaling is still set to false.  Otherwise
> I can't see what's endian specific here (other than that
> -eb is probably still second class citizen in vectorization
> feature enablement?)
>
> Please open a bugreport against the testsuite in the meantime.
>

I understand, and I have just opened pr78421 to keep track of this.
Sorry for the delay.

Christophe

> Richard.
>
>> Christophe
>>
>> > Thanks,
>> > Richard.
>> >
>> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c 
>> > b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
>> > index 52fdcf6..3752600 100644
>> > --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
>> > +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
>> > @@ -71,5 +71,5 @@ int main (void)
>> >    return 0;
>> >  }
>> >
>> > -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { 
>> > target vect_strided2 } } } */
>> > -
>> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> > target { vect_strided2 && { ! vect_hw_misalign } } } } } */
>> > +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { 
>> > target { vect_strided2 && vect_hw_misalign } } } } */
>>
>>
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

Reply via email to