Hi,
I have been playing around with making Kewen's partial vector changes
workable with s390:
We have a vll instruction that can be passed the highest byte to load.
The rather unfortunate consequence of this is that a length of zero
cannot be specified. The partial vector framework, however, relies a
lot on the fact that a len_load can be made a NOP using a length of zero.
After confirming an additional zero-check before each vll is definitely
too slow across SPEC and some discussion with Kewen we figured the
easiest way forward is to exclude loops with multiple VFs (despite
giving up vectorization possibilities). These are prone to len_loads
with zero while the regular induction variable check prevents them in
single-VF loops.
So, as a quick hack, I went with
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 75f24e7c4f6..f79222daeb6 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1170,6 +1170,9 @@ vect_verify_loop_lens (loop_vec_info loop_vinfo)
if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
return false;
+ if (LOOP_VINFO_LENS (loop_vinfo).length () > 1)
+ return false;
+
which could be made a hook, eventually. FWIW this is sufficient to make
bootstrap, regtest and compiling the SPEC suites succeed. I'm unsure
whether we are guaranteed not to emit len_load with zero now. On top,
I subtract 1 from the passed length in the expander, which, supposedly,
is also not ideal.
There are some regressions that I haven't fully analyzed yet but whether
and when to actually enable this feature could be a backend decision
with the necessary middle-end checks already in place.
Any ideas on how to properly check for the zero condition and exclude
the cases that cause it? Kewen suggested enriching the len_load optabs
with a separate parameter.
Regards
Robin