On 30/10/2025 13:19, Robin Dapp wrote:
Hi,
This patch adjusts vect_gather_scatter_fn_p to always check an offset
type with swapped signedness (vs. the original offset argument).
If the target supports the gather/scatter with the new offset type as
well as the conversion of the offset we now emit an explicit offset
conversion before the actual gather/scatter.
The relaxation is only done for the IFN path of gather/scatter and the
general idea roughly looks like:
- vect_gather_scatter_fn_p builds a list of all offset vector types
that the target supports for the current vectype. Then it goes
through that list, trying direct support first and sign-swapped
offset types next, taking precision requirements into account.
If successful it sets supported_offset_vectype to the type that actually
worked while offset_vectype_out is the type that was requested.
- vect_check_gather_scatter works as before but uses the relaxed
vect_gather_scatter_fn_p.
- get_load_store_type sets ls_data->supported_offset_vectype if the
requested type wasn't supported but another one was.
- check_load_store_for_partial_vectors uses the
supported_offset_vectype in order to validate what get_load_store_type
determined.
- vectorizable_load/store emit a conversion if
ls_data->supported_offset_vectype is nonzero and cost it.
The offset type is either of pointer size (if we started with a signed
offset) or twice the size of the original offset (when that one was
unsigned).
Main change from v2 is Richi's idea of a rework of vect_gather_scatter_fn_p.
It now first collects target supported offset vectypes. Afterwards these
vectypes are filtered by direct and indirect (sign-swap) support. I think
the logic is clearer and more straightforward now.
I guess it's debatable whether to include IFN and else value in the
configuration or not as they don't usually change per offset...
Bootstrapped on x86 and power10. Regtested on rv64gcv_zvl512b
(with and without implicit target support for zero-extension) and aarch64.
I was hoping that this patch might help me fix pr121393 (on which I'm
still stuck), if I just disabled the unsigned gather/scatter patterns,
but no such luck: it just zero-extends the supposed-to-be-negative
offset and suffers exactly the same bug.
On the plus side, whereas previously if I disabled unsigned offsets I
got a total failure to vectorize, whereas with your patch it now
vectorizes just fine, albeit with the bug.
Assuming your patch is approved, should I continue to accept signed
offsets in the backend? The hardware instruction that accepts 32-bit
offsets only accepts unsigned offsets, so I'm supporting signed offsets
by calculating the absolute address explicitly. Of course, the backend
has no way to know if the range of possible offsets is actually safe to
treat it as unsigned, so is your way likely to produce better code?
The backend also accepts 64-bit offsets, but there's no hardware
instruction for that, so both variants are converted to absolute
addresses explicitly. This variant only exists because I found that the
SPEC HPC "lbm" benchmark fails to vectorize at all without it, but it
would be better if middle end didn't choose to use 64-bit offsets unless
absolutely necessary, but vectorizing is better than not.
Andrew