https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, can't get the test to FAIL with a cross, somehow the dejagnu tests always
end up UNSUPPORTED.  The testcase for x86_64 has

/* With AVX256 or more we do not pull off the trick eliding the epilogue.  */
/* { dg-additional-options "-mprefer-avx128" { target { x86_64-*-* i?86-*-* } }
} */

so we require the use of V16QImode -> V4SImode SAD with the V16QImode loads
split into two V8QImode ones.  There were insufficient dejagnu effective
targets to model the restriction in

+         /* If the gap splits the vector in half and the target
+            can do half-vector operations avoid the epilogue peeling
+            by simply loading half of the vector only.  Usually
+            the construction with an upper zero half will be elided.  */
+         dr_alignment_support alignment_support_scheme;
+         scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
+         machine_mode vmode;
+         if (overrun_p
+             && !masked_p
+             && (((alignment_support_scheme
+                     = vect_supportable_dr_alignment (first_dr_info, false)))
+                  == dr_aligned
+                 || alignment_support_scheme == dr_unaligned_supported)
+             && known_eq (nunits, (group_size - gap) * 2)
+             && mode_for_vector (elmode, (group_size - gap)).exists (&vmode)
+             && VECTOR_MODE_P (vmode)
+             && targetm.vector_mode_supported_p (vmode)
+             && (convert_optab_handler (vec_init_optab,
+                                        TYPE_MODE (vectype), vmode)
+                 != CODE_FOR_nothing))
+           overrun_p = false;

I see we probably need hw_misalign, so does

Index: gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c (revision 270899)
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c (working copy)
@@ -25,5 +25,5 @@ int x264_pixel_sad_8x8( uint8_t *pix1, u

 /* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } }
*/
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump-not "access with gaps requires scalar epilogue
loop" "vect" } } */
+/* { dg-final { scan-tree-dump-not "access with gaps requires scalar epilogue
loop" "vect" { xfail { ! vect_hw_misalign } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

fix everything?

Reply via email to