On Thu, Oct 2, 2025 at 11:14 AM Andrew Stubbs <[email protected]> wrote:
>
> On 02/10/2025 10:58, Richard Biener wrote:
> > On Fri, Sep 26, 2025 at 1:27 PM Andrew Stubbs <[email protected]> wrote:
> >>
> >> When testcases (such as libgomp.c-c++-common/for-11.c) use unsigned types
> >> for
> >> loop counters then the vectorizer will also use unsigned types for the
> >> offsets
> >> passed to gather/scatter instructions. This is a problem if the loop step
> >> is
> >> negative and the offset type is narrower than a pointer.
> >
> > Hmm, but why did we chose an unsigned type in the first place then? So,
> > isn't this an issue in vect_truncate_gather_scatter_offset or
> > vect_use_strided_gather_scatters_p?
>
> Here's the reproducer from PR121393:
>
> float a[100];
>
> int f()
> {
> unsigned int i; <----- unsigned comes from here
>
> #pragma omp target simd
> for (i=100; i != 0; i-=2)
> a[i] = a[i] + 5.5;
> }
>
> int main()
> {
> f();
> return 0;
> }
>
>
> The patch is not needed if "i" is either signed or long long.
So given other targets/code-paths are fine it seems like the gimple IL is
mis-interpreted somehow. Checking risc-v (also using stride load/store)
does
_27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [4, 4]);
_8 = _27 * 18446744073709551608;
vect__1.7_17 = .MASK_LEN_STRIDED_LOAD (vectp_a.5_14, -8, { 0.0, ...
}, { -1, ... }, _16(D), _27, 0);
vect__2.8_19 = .COND_LEN_ADD ({ -1, ... }, vect__1.7_17, { 5.5e+0,
... }, _18(D), _27, 0);
_1 = a[i_10];
_2 = _1 + 5.5e+0;
.MASK_LEN_STRIDED_STORE (vectp_a.9_22, -8, vect__2.8_19, { -1, ... }, _27, 0);
i_7 = i_10 + 4294967294;
and in vect_use_strided_gather_scatters_p we end up with unsigned vector([4,4].
Your patch only changes how VEC_SERIES_EXPR is handled, you
make that signed. It seems to me that GCN eventually cannot do
a VnSImode series but you always build a VnDImode one and fail to
truncate? Or you claim you can handle VnSImode offset vectors for
gather/scatter but in reality require sign-/zero-extended VnDImode?
Note IIRC the gather/scatter optabs handle signed vs. unsigned
(aka zero vs. sign extension) via predicates. See
internal_gather_scatter_fn_supported_p.
So with unsigned we expect the target to zero-extend. Which might be
indeed an issue for negative step when we handle gather as
&a[100] + offset instead of &a[100 + index-offset]. The place you fix
is IMO still wrong.
For risc-v we end up using &a as base and we have (sizetype)i * 4 as offset,
so we shouldn't get to problematic values and instead have a series like
{396, 392, ... 0, -4U, ... } with the negative values being only operating on
masked lanes?
Is that what you see on GCN as well?
Richard.
> I have not observed issues in the other places, but I don't claim a full
> understanding of this code.
>
> Andrew
>
>
> >> This commit ensures that negative offsets are labelled as signed.
> >>
> >> gcc/ChangeLog:
> >>
> >> PR target/121393
> >> * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Use signed
> >> offsets for negative steps.
> >> ---
> >>
> >> This fixes a problem observed on amdgcn, but there doesn't seem to be
> >> anything about it that is target-specific.
> >>
> >> OK for mainline?
> >>
> >> Andrew
> >>
> >>
> >> gcc/tree-vect-stmts.cc | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >> index cfc4f323a22..254fe6e6c0f 100644
> >> --- a/gcc/tree-vect-stmts.cc
> >> +++ b/gcc/tree-vect-stmts.cc
> >> @@ -2969,6 +2969,14 @@ vect_get_strided_load_store_ops (stmt_vec_info
> >> stmt_info, slp_tree node,
> >> type of the vector instead. */
> >> tree offset_type = TREE_TYPE (offset_vectype);
> >>
> >> + if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST
> >> + && compare_step_with_zero (loop_vinfo, stmt_info) < 0
> >> + && TYPE_SIGN (offset_type) == UNSIGNED)
> >> + {
> >> + offset_type = signed_type_for (offset_type);
> >> + offset_vectype = signed_type_for (offset_vectype);
> >> + }
> >> +
> >> /* Calculate X = DR_STEP / SCALE and convert it to the appropriate
> >> type. */
> >> tree step = size_binop (EXACT_DIV_EXPR, unshare_expr (DR_STEP (dr)),
> >> ssize_int (SLP_TREE_GS_SCALE (node)));
> >> --
> >> 2.51.0
> >>
>