This time to the list too (sorry for double email)

Hi,

The original patch '[vect] Re-analyze all modes for epilogues', skipped modes that should not be skipped since it used the vector mode provided by autovectorize_vector_modes to derive the minimum VF required for it. However, those modes should only really be used to dictate vector size, so instead this patch looks for the mode in 'used_vector_modes' with the largest element size, and constructs a vector mode with the smae size as the current vector_modes[mode_i]. Since we are using the largest element size the NUNITs for this mode is the smallest possible VF required for an epilogue with this mode and should thus skip only the modes we are certain can not be used.

Passes bootstrap and regression on x86_64 and aarch64.

gcc/ChangeLog:

        PR 103997
        * tree-vect-loop.c (vect_analyze_loop): Fix mode skipping for epilogue
        vectorization.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
ba67de490bbd033b6db6217c8f9f9ca04cec323b..87b5ec5b4c6cb40e922b1e04bb7777ce74233af8
 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3038,12 +3038,37 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
         would be at least as high as the main loop's and we would be
         vectorizing for more scalar iterations than there would be left.  */
       if (!supports_partial_vectors
-         && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
-       {
-         mode_i++;
-         if (mode_i == vector_modes.length ())
-           break;
-         continue;
+         && VECTOR_MODE_P (vector_modes[mode_i]))
+       {
+         /* To make sure we are conservative as to what modes we skip, we
+            should use check the smallest possible NUNITS which would be
+            derived from the mode in USED_VECTOR_MODES with the largest
+            element size.  */
+         scalar_mode max_elsize_mode = GET_MODE_INNER (vector_modes[mode_i]);
+         for (vec_info::mode_set::iterator i =
+               first_loop_vinfo->used_vector_modes.begin ();
+             i != first_loop_vinfo->used_vector_modes.end (); ++i)
+           {
+             if (VECTOR_MODE_P (*i)
+                 && GET_MODE_SIZE (GET_MODE_INNER (*i))
+                 > GET_MODE_SIZE (max_elsize_mode))
+               max_elsize_mode = GET_MODE_INNER (*i);
+           }
+         /* After finding the largest element size used in the main loop, find
+            the related vector mode with the same size as the mode
+            corresponding to the current MODE_I.  */
+         machine_mode max_elsize_vector_mode =
+           related_vector_mode (vector_modes[mode_i], max_elsize_mode,
+                                0).else_void ();
+         if (VECTOR_MODE_P (max_elsize_vector_mode)
+             && maybe_ge (GET_MODE_NUNITS (max_elsize_vector_mode),
+                          first_vinfo_vf))
+           {
+             mode_i++;
+             if (mode_i == vector_modes.length ())
+             break;
+             continue;
+           }
        }
 
       if (dump_enabled_p ())

Reply via email to