On Thu, 10 Nov 2016, Christophe Lyon wrote: > On 10 November 2016 at 09:34, Richard Biener <rguent...@suse.de> wrote: > > On Wed, 9 Nov 2016, Christophe Lyon wrote: > > > >> On 9 November 2016 at 09:36, Bin.Cheng <amker.ch...@gmail.com> wrote: > >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener <rguent...@suse.de> wrote: > >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote: > >> >> > >> >>> Hi Richard, > >> >>> > >> >>> > >> >>> On 7 November 2016 at 09:01, Richard Biener <rguent...@suse.de> wrote: > >> >>> > > >> >>> > The following fixes an oversight when computing alignment in the > >> >>> > vectorizer. > >> >>> > > >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to > >> >>> > trunk. > >> >>> > > >> >>> > Richard. > >> >>> > > >> >>> > 2016-11-07 Richard Biener <rguent...@suse.de> > >> >>> > > >> >>> > PR tree-optimization/78189 > >> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): > >> >>> > Fix > >> >>> > alignment computation. > >> >>> > > >> >>> > * g++.dg/torture/pr78189.C: New testcase. > >> >>> > > >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C > >> >>> > =================================================================== > >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) > >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) > >> >>> > @@ -0,0 +1,41 @@ > >> >>> > +/* { dg-do run } */ > >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize > >> >>> > -fno-vect-cost-model" } */ > >> >>> > + > >> >>> > +#include <cstddef> > >> >>> > + > >> >>> > +struct A > >> >>> > +{ > >> >>> > + void * a; > >> >>> > + void * b; > >> >>> > +}; > >> >>> > + > >> >>> > +struct alignas(16) B > >> >>> > +{ > >> >>> > + void * pad; > >> >>> > + void * misaligned; > >> >>> > + void * pad2; > >> >>> > + > >> >>> > + A a; > >> >>> > + > >> >>> > + void Null(); > >> >>> > +}; > >> >>> > + > >> >>> > +void B::Null() > >> >>> > +{ > >> >>> > + a.a = nullptr; > >> >>> > + a.b = nullptr; > >> >>> > +} > >> >>> > + > >> >>> > +void __attribute__((noinline,noclone)) > >> >>> > +NullB(void * misalignedPtr) > >> >>> > +{ > >> >>> > + B* b = reinterpret_cast<B*>(reinterpret_cast<char > >> >>> > *>(misalignedPtr) - offsetof(B, misaligned)); > >> >>> > + b->Null(); > >> >>> > +} > >> >>> > + > >> >>> > +int main() > >> >>> > +{ > >> >>> > + B b; > >> >>> > + NullB(&b.misaligned); > >> >>> > + return 0; > >> >>> > +} > >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > >> >>> > index 9346cfe..b03cb1e 100644 > >> >>> > --- gcc/tree-vect-data-refs.c > >> >>> > +++ gcc/tree-vect-data-refs.c > >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct > >> >>> > data_reference *dr) > >> >>> > base = ref; > >> >>> > while (handled_component_p (base)) > >> >>> > base = TREE_OPERAND (base, 0); > >> >>> > + unsigned int base_alignment; > >> >>> > + unsigned HOST_WIDE_INT base_bitpos; > >> >>> > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); > >> >>> > + /* As data-ref analysis strips the MEM_REF down to its base > >> >>> > operand > >> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have > >> >>> > to > >> >>> > + adjust things to make base_alignment valid as the alignment of > >> >>> > + DR_BASE_ADDRESS. */ > >> >>> > if (TREE_CODE (base) == MEM_REF) > >> >>> > - base = build2 (MEM_REF, TREE_TYPE (base), base_addr, > >> >>> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, > >> >>> > 1)), 0)); > >> >>> > - unsigned int base_alignment = get_object_alignment (base); > >> >>> > + { > >> >>> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * > >> >>> > BITS_PER_UNIT; > >> >>> > + base_bitpos &= (base_alignment - 1); > >> >>> > + } > >> >>> > + if (base_bitpos != 0) > >> >>> > + base_alignment = base_bitpos & -base_bitpos; > >> >>> > + /* Also look at the alignment of the base address DR analysis > >> >>> > + computed. */ > >> >>> > + unsigned int base_addr_alignment = get_pointer_alignment > >> >>> > (base_addr); > >> >>> > + if (base_addr_alignment > base_alignment) > >> >>> > + base_alignment = base_addr_alignment; > >> >>> > > >> >>> > if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) > >> >>> > DR_VECT_AUX (dr)->base_element_aligned = true; > >> >>> > >> >>> Since you committed this patch (r241892), I'm seeing execution > >> >>> failures: > >> >>> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test > >> >>> gcc.dg/vect/pr40074.c execution test > >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 > >> >>> --with-fpu=neon-fp16 > >> >>> (using qemu as simulator) > >> >> > >> >> The difference is that we now vectorize the testcase with versioning > >> >> for alignment (but it should never execute the vectorized variant). > >> >> I need arm peoples help to understand what is wrong. > >> > Hi All, > >> > I will look at it. > >> > > >> > >> Hi, > >> > >> This is causing new regressions on armeb: > >> gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects > >> scan-tree-dump-times vect "vectorized 2 loops" 1 > >> gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect > >> "vectorized 2 loops" 1 > > > > It's actually an improvement as armeb can't do unaligned loads. > > Before the patch we versioned both loops for alignment (they > > can't be possibly both aligned by luck). After the patch we > > align 'arr' and thus the first loop becomes known-aligned and > > vectorized while the 2nd one is now known unaligned (and thus > > not vectorized because we can't do unaligned accesses). > > > > I suppose the following will fix it. Can you test that on > > some arm variants please? > > > > It works ok on x86_64/i?86 so please commit if it fixes the issue > > for you. > > > > Well, it works on armeb (the test now passes), but regresses on > arm-* :(
You have to help me here, it's not efficient time spent by me figuring out the right dg-targets here. I can only suppose that maybe it uses load-lanes and that's happy with unaligned accesses but hw_misaling is still set to false. Otherwise I can't see what's endian specific here (other than that -eb is probably still second class citizen in vectorization feature enablement?) Please open a bugreport against the testsuite in the meantime. Richard. > Christophe > > > Thanks, > > Richard. > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c > > b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c > > index 52fdcf6..3752600 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c > > @@ -71,5 +71,5 @@ int main (void) > > return 0; > > } > > > > -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { > > target vect_strided2 } } } */ > > - > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { > > target { vect_strided2 && { ! vect_hw_misalign } } } } } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { > > target { vect_strided2 && vect_hw_misalign } } } } */ > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)