Hi,

Several new (ish?) autovectorizer features have apparently caused NEON
support for same to regress quite heavily in big-endian mode. This
patch is an attempt to fix things up, but is not without problems --
maybe someone will have a suggestion as to how we should proceed.

The problem (as ever) is that the ARM backend must lie to the
middle-end about the layout of NEON vectors in big-endian mode (due to
ABI requirements, VFP compatibility, and the middle-end semantics of
vector indices being equivalent to those of an array with the same type
of elements when stored in memory). A few years ago when the vectorizer
was relatively less sophisticated, the ordering of vector elements
could be ignored to some extent by disabling certain instruction
patterns used by the vectorizer in big-endian mode which were sensitive
to the ordering of elements: in fact this is still the strategy we're
using, but it is clearly becoming less and less tenable as time
progresses. Quad-word registers (being composed of two double-word
registers, loaded/stored the "wrong way round" in big-endian mode)
arguably cause more problems than double-word registers.

So, the idea behind the attached patch was supposed to be to limit the
autovectorizer to using double-word registers only, and to disable a
few additional (or newly-used by the vectorizer) patterns in big-endian
mode. That, plus several testsuite tweaks, gets us down to zero
failures for vect.exp, which is good.

The problem is that at the same time quite a large set of neon.exp tests
regress (vzip/vuzp/vtrn): one of the new patterns which is
disabled because it causes trouble (i.e. execution failures) for the
vectorizer is vec_perm_const<mode>. However __builtin_shuffle (which
uses that pattern) is used for arm_neon.h now -- so disabling it means
that the proper instructions aren't generated for intrinsics any more in
big-endian mode.

I think we have a problem here. The vectorizer also tries to use
__builtin_shuffle (for scatter/gather operations, when lane
loading/storing ops aren't available), but does not understand the
"special tweaks" that arm_evpc_neon_{vuzp,vzip,vtrn} does to try to
hide the true element ordering of vectors from the middle-end. So, I'm
left wondering:

 * Given our funky element ordering in BE mode, are the
   __builtin_shuffle lists in arm_neon.h actually an accurate
   representation of what the given intrinsic should do? (The fallback
   code might or might not do the same thing, I'm not sure.)

 * The vectorizer tries to use VEC_PERM_EXPR (equivalent to
   __builtin_shuffle) with e.g. pairs of doubleword registers loaded
   from adjacent memory locations. Are the semantics required for this
   (again, with our funky element ordering) even the same as those
   required for the intrinsics? Including quad-word registers for the
   latter? (My suspicion is "no", in which case there's a fundamental
   incompatibility here that needs to be resolved somehow.)

Anyway: the tl;dr is "fixing NEON vect tests breaks intrinsics". Any
ideas for what to do about that? (FAOD, I don't think I'm in a position
to do the kind of middle-end surgery required to fix the problem
"properly" at this point :-p).

(It's arguably more important for the vectorizer to not generate bad
code than it is for intrinsics to work properly, in which case: OK to
apply? Tested cross to ARM EABI with configury modifications to build
LE/BE multilibs.)

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_array_mode_supported_p): No array modes for
    big-endian NEON.
    (arm_preferred_simd_mode): Always prefer 64-bit modes for
    big-endian NEON.
    (arm_autovectorize_vector_sizes): Use 8-byte vectors only for NEON.
    (arm_vectorize_vec_perm_const_ok): No permutations are OK in
    big-endian mode.
    * config/arm/neon.md (vec_load_lanes<mode><mode>): Disable in
    big-endian mode.
    (vec_store_lanes<mode><mode>, vec_load_lanesti<mode>)
    (vec_load_lanesoi<mode>, vec_store_lanesti<mode>)
    (vec_store_lanesoi<mode>, vec_load_lanesei<mode>)
    (vec_load_lanesci<mode>, vec_store_lanesei<mode>)
    (vec_store_lanesci<mode>, vec_load_lanesxi<mode>)
    (vec_store_lanesxi<mode>): Likewise.
    (vec_widen_<US>shiftl_lo_<mode>, vec_widen_<US>shiftl_hi_<mode>)
    (vec_widen_<US>mult_hi_<mode>, vec_widen_<US>mult_lo_<mode>):
    Likewise.

    gcc/testsuite/
    * gcc.dg/vect/slp-cond-3.c: XFAIL for !vect_unpack.
    * gcc.dg/vect/slp-cond-4.c: Likewise.
    * gcc.dg/vect/vect-1.c: Likewise.
    * gcc.dg/vect/vect-1-big-array.c: Likewise.
    * gcc.dg/vect/vect-35.c: Likewise.
    * gcc.dg/vect/vect-35-big-array.c: Likewise.
    * gcc.dg/vect/bb-slp-11.c: Likewise.
    * gcc.dg/vect/bb-slp-26.c: Likewise.
    * gcc.dg/vect/vect-over-widen-3-big-array.c: XFAIL
    for !vect_element_align.
    * gcc.dg/vect/vect-over-widen-1.c: Likewise.
    * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
    * gcc.dg/vect/vect-over-widen-2.c: Likewise.
    * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
    * gcc.dg/vect/vect-over-widen-3.c: Likewise.
    * gcc.dg/vect/vect-over-widen-4.c: Likewise.
    * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
    * gcc.dg/vect/pr43430-2.c: Likewise.
    * gcc.dg/vect/vect-widen-shift-u16.c: XFAIL for !vect_widen_shift
    && !vect_unpack.
    * gcc.dg/vect/vect-widen-shift-s8.c: Likewise.
    * gcc.dg/vect/vect-widen-shift-u8.c: Likewise.
    * gcc.dg/vect/vect-widen-shift-s16.c: Likewise.
    * gcc.dg/vect/vect-93.c: Only run if !vect_intfloat_cvt.
    * gcc.dg/vect/vect-intfloat-conversion-4a.c: Only run if
    vect_unpack.
    * gcc.dg/vect/vect-intfloat-conversion-4b.c: Likewise.
    * lib/target-supports.exp (check_effective_target_vect_perm): Only
    enable for NEON little-endian.
    (check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
    (check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
    (check_effective_target_vect_widen_mult_hi_to_si): Likewise.
    (check_effective_target_vect_widen_shift): Likewise.
    (check_effective_target_vect_extract_even_odd): Likewise.
    (check_effective_target_vect_interleave): Likewise.
    (check_effective_target_vect_stridedN): Likewise.
    (check_effective_target_vect_multiple_sizes): Likewise.
    (check_effective_target_vect64): Enable for any NEON.
    
Index: gcc/testsuite/gcc.dg/vect/slp-cond-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-cond-3.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/slp-cond-3.c	(working copy)
@@ -79,6 +79,6 @@ int main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-1.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-1.c	(working copy)
@@ -86,5 +86,5 @@ foo (int n)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_strided2 } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail { vect_strided2 || { ! vect_unpack } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-cond-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-cond-4.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/slp-cond-4.c	(working copy)
@@ -82,5 +82,5 @@ int main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-1-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-1-big-array.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-1-big-array.c	(working copy)
@@ -86,5 +86,5 @@ foo (int n)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_strided2 } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail { vect_strided2 || { ! vect_unpack } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-35.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-35.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-35.c	(working copy)
@@ -45,6 +45,6 @@ int main (void)
 } 
 
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { xfail { ia64-*-* sparc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { xfail { { ia64-*-* sparc*-*-* } || { ! vect_unpack } } } } } */
 /* { dg-final { scan-tree-dump "can't determine dependence between" "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	(working copy)
@@ -59,6 +59,6 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c	(working copy)
@@ -53,6 +53,6 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-26.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-26.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-26.c	(working copy)
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 xfail { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "slp" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	(working copy)
@@ -62,6 +62,6 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-35-big-array.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-35-big-array.c	(working copy)
@@ -45,6 +45,6 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { xfail { ia64-*-* sparc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { xfail { { ia64-*-* sparc*-*-* } || { ! vect_unpack } } } } } */
 /* { dg-final { scan-tree-dump-times "can't determine dependence between" 1 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c	(working copy)
@@ -60,6 +60,6 @@ int main (void)
 
 /* Final value stays in int, so no over-widening is detected at the moment.  */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/pr43430-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr43430-2.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/pr43430-2.c	(working copy)
@@ -12,5 +12,5 @@ vsad16_c (void *c, uint8_t * s1, uint8_t
   return score;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_condition } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_condition && vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c	(working copy)
@@ -53,6 +53,6 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	(working copy)
@@ -61,6 +61,6 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c	(working copy)
@@ -59,6 +59,6 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	(working copy)
@@ -66,6 +66,6 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	(working copy)
@@ -65,6 +65,6 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-93.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-93.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-93.c	(working copy)
@@ -79,7 +79,7 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_no_align } } } */
 
 /* in main: */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { vect_no_align && { ! vect_intfloat_cvt } } } } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { xfail { vect_no_align } } } } */
 
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c	(working copy)
@@ -60,5 +60,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c	(working copy)
@@ -35,5 +35,5 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_intfloat_cvt && vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c	(working copy)
@@ -35,5 +35,5 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_intfloat_cvt && vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c	(working copy)
@@ -60,6 +60,6 @@ int main (void)
 
 /* Final value stays in int, so no over-widening is detected at the moment.  */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_element_align } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-11.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-11.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-11.c	(working copy)
@@ -48,6 +48,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp" { target vect64 xfail { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "slp" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c	(revision 196170)
+++ gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c	(working copy)
@@ -102,6 +102,6 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 8 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail { ! vect_widen_shift } && { ! vect_unpack } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 196170)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -3089,7 +3089,8 @@ proc check_effective_target_vect_perm { 
         verbose "check_effective_target_vect_perm: using cached result" 2
     } else {
         set et_vect_perm_saved 0
-        if { [is-effective-target arm_neon_ok]
+        if { ([is-effective-target arm_neon_ok]
+	      && [is-effective-target arm_little_endian])
 	     || [istarget aarch64*-*-*]
 	     || [istarget powerpc*-*-*]
              || [istarget spu-*-*]
@@ -3211,7 +3212,8 @@ proc check_effective_target_vect_widen_s
     } else {
         set et_vect_widen_sum_qi_to_hi_saved 0
 	if { [check_effective_target_vect_unpack] 
-	     || [check_effective_target_arm_neon_ok]
+	     || ([check_effective_target_arm_neon_ok]
+		 && [check_effective_target_arm_little_endian])
 	     || [istarget ia64-*-*] } {
             set et_vect_widen_sum_qi_to_hi_saved 1
 	}
@@ -3263,7 +3265,8 @@ proc check_effective_target_vect_widen_m
 	}
         if { [istarget powerpc*-*-*]
               || [istarget aarch64*-*-*]
-              || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } {
+              || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]
+		  && [check_effective_target_arm_little_endian]) } {
             set et_vect_widen_mult_qi_to_hi_saved 1
         }
     }
@@ -3298,7 +3301,8 @@ proc check_effective_target_vect_widen_m
 	      || [istarget aarch64*-*-*]
 	      || [istarget i?86-*-*]
 	      || [istarget x86_64-*-*]
-              || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } {
+              || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]
+		  && [check_effective_target_arm_little_endian]) } {
             set et_vect_widen_mult_hi_to_si_saved 1
         }
     }
@@ -3368,7 +3372,8 @@ proc check_effective_target_vect_widen_s
         verbose "check_effective_target_vect_widen_shift: using cached result" 2
     } else {
         set et_vect_widen_shift_saved 0
-        if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } {
+        if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]
+	      && [check_effective_target_arm_little_endian]) } {
             set et_vect_widen_shift_saved 1
         }
     }
@@ -3859,7 +3864,8 @@ proc check_effective_target_vect_extract
         set et_vect_extract_even_odd_saved 0 
 	if { [istarget aarch64*-*-*]
 	     || [istarget powerpc*-*-*]
-	     || [is-effective-target arm_neon_ok]
+	     || ([is-effective-target arm_neon_ok]
+		 && [is-effective-target arm_little_endian])
              || [istarget i?86-*-*]
              || [istarget x86_64-*-*]
              || [istarget ia64-*-*]
@@ -3885,7 +3891,8 @@ proc check_effective_target_vect_interle
         set et_vect_interleave_saved 0
 	if { [istarget aarch64*-*-*]
 	     || [istarget powerpc*-*-*]
-	     || [is-effective-target arm_neon_ok]
+	     || ([is-effective-target arm_neon_ok]
+		 && [is-effective-target arm_little_endian])
              || [istarget i?86-*-*]
              || [istarget x86_64-*-*]
              || [istarget ia64-*-*]
@@ -3915,7 +3922,8 @@ foreach N {2 3 4 8} {
 		     && [check_effective_target_vect_extract_even_odd] } {
 		    set et_vect_stridedN_saved 1
 		}
-		if { ([istarget arm*-*-*]
+		if { (([istarget arm*-*-*] && [is-effective-target arm_neon_ok]
+		       && [is-effective-target arm_little_endian])
 		      || [istarget aarch64*-*-*]) && N >= 2 && N <= 4 } {
 		    set et_vect_stridedN_saved 1
 		}
@@ -3934,7 +3942,8 @@ proc check_effective_target_vect_multipl
 
     set et_vect_multiple_sizes_saved 0
     if { ([istarget aarch64*-*-*]
-	  || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok])) } {
+	  || ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]
+	      && [check_effective_target_arm_little_endian])) } {
        set et_vect_multiple_sizes_saved 1
     }
     if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
@@ -3957,8 +3966,7 @@ proc check_effective_target_vect64 { } {
     } else {
         set et_vect64_saved 0
         if { ([istarget arm*-*-*]
-	      && [check_effective_target_arm_neon_ok]
-	      && [check_effective_target_arm_little_endian]) } {
+	      && [check_effective_target_arm_neon_ok]) } {
            set et_vect64_saved 1
         }
     }
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 196170)
+++ gcc/config/arm/arm.c	(working copy)
@@ -25041,7 +25041,7 @@ static bool
 arm_array_mode_supported_p (enum machine_mode mode,
 			    unsigned HOST_WIDE_INT nelems)
 {
-  if (TARGET_NEON
+  if (TARGET_NEON && !BYTES_BIG_ENDIAN
       && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
       && (nelems >= 2 && nelems <= 4))
     return true;
@@ -25057,23 +25057,27 @@ static enum machine_mode
 arm_preferred_simd_mode (enum machine_mode mode)
 {
   if (TARGET_NEON)
-    switch (mode)
-      {
-      case SFmode:
-	return TARGET_NEON_VECTORIZE_DOUBLE ? V2SFmode : V4SFmode;
-      case SImode:
-	return TARGET_NEON_VECTORIZE_DOUBLE ? V2SImode : V4SImode;
-      case HImode:
-	return TARGET_NEON_VECTORIZE_DOUBLE ? V4HImode : V8HImode;
-      case QImode:
-	return TARGET_NEON_VECTORIZE_DOUBLE ? V8QImode : V16QImode;
-      case DImode:
-	if (!TARGET_NEON_VECTORIZE_DOUBLE)
-	  return V2DImode;
-	break;
+    {
+      bool double_only = BYTES_BIG_ENDIAN || TARGET_NEON_VECTORIZE_DOUBLE;
 
-      default:;
-      }
+      switch (mode)
+	{
+	case SFmode:
+	  return double_only ? V2SFmode : V4SFmode;
+	case SImode:
+	  return double_only ? V2SImode : V4SImode;
+	case HImode:
+	  return double_only ? V4HImode : V8HImode;
+	case QImode:
+	  return double_only ? V8QImode : V16QImode;
+	case DImode:
+	  if (!double_only)
+	    return V2DImode;
+	  break;
+
+	default:;
+	}
+    }
 
   if (TARGET_REALLY_IWMMXT)
     switch (mode)
@@ -25974,6 +25978,11 @@ arm_vector_alignment (const_tree type)
 static unsigned int
 arm_autovectorize_vector_sizes (void)
 {
+  /* Use of quad-word registers for autovectorization for NEON is fraught with
+     difficulties.  Just don't do that.  */
+  if (TARGET_NEON && BYTES_BIG_ENDIAN)
+    return 8;
+
   return TARGET_NEON_VECTORIZE_DOUBLE ? 0 : (16 | 8);
 }
 
@@ -27008,6 +27017,12 @@ arm_vectorize_vec_perm_const_ok (enum ma
   unsigned int i, nelt, which;
   bool ret;
 
+  /* FIXME: There appear to be element-numbering problems with vector
+     permutations in big-endian mode that cause the vectorizer to produce bad
+     code.  Disable for now.  */
+  if (BYTES_BIG_ENDIAN)
+    return false;
+
   d.vmode = vmode;
   d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
   d.testing_p = true;
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 196170)
+++ gcc/config/arm/neon.md	(working copy)
@@ -4506,7 +4506,7 @@
   [(set (match_operand:VDQX 0 "s_register_operand")
         (unspec:VDQX [(match_operand:VDQX 1 "neon_struct_operand")]
                      UNSPEC_VLD1))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vld1<mode>"
   [(set (match_operand:VDQX 0 "s_register_operand" "=w")
@@ -4618,7 +4618,7 @@
   [(set (match_operand:VDQX 0 "neon_struct_operand")
 	(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand")]
 		     UNSPEC_VST1))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vst1<mode>"
   [(set (match_operand:VDQX 0 "neon_struct_operand" "=Um")
@@ -4683,7 +4683,7 @@
         (unspec:TI [(match_operand:TI 1 "neon_struct_operand")
                     (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 		   UNSPEC_VLD2))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vld2<mode>"
   [(set (match_operand:TI 0 "s_register_operand" "=w")
@@ -4708,7 +4708,7 @@
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
                     (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 		   UNSPEC_VLD2))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vld2<mode>"
   [(set (match_operand:OI 0 "s_register_operand" "=w")
@@ -4797,7 +4797,7 @@
 	(unspec:TI [(match_operand:TI 1 "s_register_operand")
                     (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST2))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vst2<mode>"
   [(set (match_operand:TI 0 "neon_struct_operand" "=Um")
@@ -4822,7 +4822,7 @@
 	(unspec:OI [(match_operand:OI 1 "s_register_operand")
                     (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST2))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vst2<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
@@ -4894,7 +4894,7 @@
         (unspec:EI [(match_operand:EI 1 "neon_struct_operand")
                     (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 		   UNSPEC_VLD3))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vld3<mode>"
   [(set (match_operand:EI 0 "s_register_operand" "=w")
@@ -4918,7 +4918,7 @@
   [(match_operand:CI 0 "s_register_operand")
    (match_operand:CI 1 "neon_struct_operand")
    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   emit_insn (gen_neon_vld3<mode> (operands[0], operands[1]));
   DONE;
@@ -5068,7 +5068,7 @@
 	(unspec:EI [(match_operand:EI 1 "s_register_operand")
                     (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST3))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vst3<mode>"
   [(set (match_operand:EI 0 "neon_struct_operand" "=Um")
@@ -5091,7 +5091,7 @@
   [(match_operand:CI 0 "neon_struct_operand")
    (match_operand:CI 1 "s_register_operand")
    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   emit_insn (gen_neon_vst3<mode> (operands[0], operands[1]));
   DONE;
@@ -5213,7 +5213,7 @@
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
                     (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 		   UNSPEC_VLD4))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vld4<mode>"
   [(set (match_operand:OI 0 "s_register_operand" "=w")
@@ -5237,7 +5237,7 @@
   [(match_operand:XI 0 "s_register_operand")
    (match_operand:XI 1 "neon_struct_operand")
    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   emit_insn (gen_neon_vld4<mode> (operands[0], operands[1]));
   DONE;
@@ -5394,7 +5394,7 @@
 	(unspec:OI [(match_operand:OI 1 "s_register_operand")
                     (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST4))]
-  "TARGET_NEON")
+  "TARGET_NEON && !BYTES_BIG_ENDIAN")
 
 (define_insn "neon_vst4<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
@@ -5418,7 +5418,7 @@
   [(match_operand:XI 0 "neon_struct_operand")
    (match_operand:XI 1 "s_register_operand")
    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   emit_insn (gen_neon_vst4<mode> (operands[0], operands[1]));
   DONE;
@@ -5725,7 +5725,7 @@
  [(set (match_operand:<V_widen> 0 "register_operand" "=w")
        (SE:<V_widen> (ashift:VW (match_operand:VW 1 "register_operand" "w")
        (match_operand:<V_innermode> 2 "const_neon_scalar_shift_amount_operand" ""))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   return "vshll.<US><V_sz_elem> %q0, %P1, %2";
 }
@@ -5771,7 +5771,7 @@
 (define_expand "vec_unpack<US>_lo_<mode>"
  [(match_operand:<V_double_width> 0 "register_operand" "")
   (SE:<V_double_width>(match_operand:VDI 1 "register_operand"))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
   emit_insn (gen_neon_unpack<US>_<mode> (tmpreg, operands[1]));
@@ -5784,7 +5784,7 @@
 (define_expand "vec_unpack<US>_hi_<mode>"
  [(match_operand:<V_double_width> 0 "register_operand" "")
   (SE:<V_double_width>(match_operand:VDI 1 "register_operand"))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
   emit_insn (gen_neon_unpack<US>_<mode> (tmpreg, operands[1]));
@@ -5800,7 +5800,7 @@
 		 	   (match_operand:VDI 1 "register_operand" "w"))
  		       (SE:<V_widen> 
 			   (match_operand:VDI 2 "register_operand" "w"))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmull.<US><V_sz_elem> %q0, %P1, %P2"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5809,7 +5809,7 @@
   [(match_operand:<V_double_width> 0 "register_operand" "")
    (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
    (SE:<V_double_width> (match_operand:VDI 2 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtx tmpreg = gen_reg_rtx (<V_widen>mode);
    emit_insn (gen_neon_vec_<US>mult_<mode> (tmpreg, operands[1], operands[2]));
@@ -5824,7 +5824,7 @@
   [(match_operand:<V_double_width> 0 "register_operand" "")
    (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
    (SE:<V_double_width> (match_operand:VDI 2 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtx tmpreg = gen_reg_rtx (<V_widen>mode);
    emit_insn (gen_neon_vec_<US>mult_<mode> (tmpreg, operands[1], operands[2]));
@@ -5839,7 +5839,7 @@
  [(match_operand:<V_double_width> 0 "register_operand" "")
    (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
    (match_operand:SI 2 "immediate_operand" "i")]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtx tmpreg = gen_reg_rtx (<V_widen>mode);
    emit_insn (gen_neon_vec_<US>shiftl_<mode> (tmpreg, operands[1], operands[2]));
@@ -5853,7 +5853,7 @@
   [(match_operand:<V_double_width> 0 "register_operand" "")
    (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
    (match_operand:SI 2 "immediate_operand" "i")]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtx tmpreg = gen_reg_rtx (<V_widen>mode);
    emit_insn (gen_neon_vec_<US>shiftl_<mode> (tmpreg, operands[1], operands[2]));

Reply via email to