On 02/10/2025 13:33, Andrew Stubbs wrote:
If I change the gather/scatter such that it only accepts unsigned
offsets, would the middle-end adapt, or would it just give up? I do not
want the vectorizer to fail. We have too many places where performance
drops off a cliff because the vectorizer just says "no" already. :(
If I change the predicate such that the define_expand only accepts
unsigned offsets (i.e. the natural ones for the real instruction), then
the vectorizer generates the bad code already observed.
If I change the predicate such that it only accepts signed offsets
(*not* the natural choice) then the vectorizer produces good code:
vect__1.9_25 = MEM <vector(32) float> [(float *)&a + 276B];
vect__1.11_28 = MEM <vector(32) float> [(float *)&a + 148B];
vect__1.13_30 = VEC_PERM_EXPR <vect__1.9_25, vect__1.11_28, { 31, 29, ..
.... but it's given up on the gather/scatter and uses contiguous load
with permute, which is probably not the best option.
(Aside.... why does it have to invert the vector? It would work
perfectly well reversed, but I suppose that's harder to reason about in
the general case?)
Andrew