On Thu, Oct 2, 2025 at 1:33 PM Andrew Stubbs <[email protected]> wrote:
>
> On 02/10/2025 11:52, Richard Biener wrote:
> > On Thu, Oct 2, 2025 at 11:14 AM Andrew Stubbs <[email protected]> wrote:
> >>
> >> On 02/10/2025 10:58, Richard Biener wrote:
> >>> On Fri, Sep 26, 2025 at 1:27 PM Andrew Stubbs <[email protected]> wrote:
> >>>>
> >>>> When testcases (such as libgomp.c-c++-common/for-11.c) use unsigned 
> >>>> types for
> >>>> loop counters then the vectorizer will also use unsigned types for the 
> >>>> offsets
> >>>> passed to gather/scatter instructions.  This is a problem if the loop 
> >>>> step is
> >>>> negative and the offset type is narrower than a pointer.
> >>>
> >>> Hmm, but why did we chose an unsigned type in the first place then?  So,
> >>> isn't this an issue in vect_truncate_gather_scatter_offset or
> >>> vect_use_strided_gather_scatters_p?
> >>
> >> Here's the reproducer from PR121393:
> >>
> >> float a[100];
> >>
> >> int f()
> >> {
> >>     unsigned int i;                  <----- unsigned comes from here
> >>
> >> #pragma omp target simd
> >>     for (i=100; i != 0; i-=2)
> >>       a[i] = a[i] + 5.5;
> >> }
> >>
> >> int main()
> >> {
> >>     f();
> >>     return 0;
> >> }
> >>
> >>
> >> The patch is not needed if "i" is either signed or long long.
> >
> > So given other targets/code-paths are fine it seems like the gimple IL is
> > mis-interpreted somehow.  Checking risc-v (also using stride load/store)
> > does
> >
> >    _27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [4, 4]);
> >    _8 = _27 * 18446744073709551608;
> >    vect__1.7_17 = .MASK_LEN_STRIDED_LOAD (vectp_a.5_14, -8, { 0.0, ...
> > }, { -1, ... }, _16(D), _27, 0);
> >    vect__2.8_19 = .COND_LEN_ADD ({ -1, ... }, vect__1.7_17, { 5.5e+0,
> > ... }, _18(D), _27, 0);
> >    _1 = a[i_10];
> >    _2 = _1 + 5.5e+0;
> >    .MASK_LEN_STRIDED_STORE (vectp_a.9_22, -8, vect__2.8_19, { -1, ... }, 
> > _27, 0);
> >    i_7 = i_10 + 4294967294;
> >
> > and in vect_use_strided_gather_scatters_p we end up with unsigned 
> > vector([4,4].
> >
> > Your patch only changes how VEC_SERIES_EXPR is handled, you
> > make that signed.  It seems to me that GCN eventually cannot do
> > a VnSImode series but you always build a VnDImode one and fail to
> > truncate?  Or you claim you can handle VnSImode offset vectors for
> > gather/scatter but in reality require sign-/zero-extended VnDImode?
> > Note IIRC the gather/scatter optabs handle signed vs. unsigned
> > (aka zero vs. sign extension) via predicates.  See
> > internal_gather_scatter_fn_supported_p.
>
> The hardware instruction accepts only 32-bit unsigned offsets, but the
> addresses are 64-bit so there's an implicit zero-extend hidden in there.
> Negative offsets do not work.
>
> When the offsets are signed then the define_expand does the sign-extend
> and 64-bit add explicitly, and uses the hardware instruction that takes
> a vector of absolute addresses.
>
> The problem in this case is that the middle-end is creating "unsigned"
> values that rely on 32-bit overflow to produce negative offsets, and it
> does not do the right thing.
>
> I recently added an insn variant that allows DImode offsets (because the
> SPEC HPC lbm benchmark doesn't vectorize without, for some reason), and
> that takes both signed and unsigned offsets and also does the offset
> calculations explicitly, but in this case the testcase uses "int" so the
> SImode variant is preferred, I think.
>
> > So with unsigned we expect the target to zero-extend.  Which might be
> > indeed an issue for negative step when we handle gather as
> > &a[100] + offset instead of &a[100 + index-offset].  The place you fix
> > is IMO still wrong.
>
> I have been unable to fix this in the backend (if it says its unsigned
> then we should treating it as such). I have spent a long time searching
> the vectorizer for a place where this "decision" is taken, and this
> seems to be the spot.
>
> If I change the gather/scatter such that it only accepts unsigned
> offsets, would the middle-end adapt, or would it just give up? I do not
> want the vectorizer to fail. We have too many places where performance
> drops off a cliff because the vectorizer just says "no" already. :(
>
> > For risc-v we end up using &a as base and we have (sizetype)i * 4 as offset,
> > so we shouldn't get to problematic values and instead have a series like
> > {396, 392, ... 0, -4U, ... } with the negative values being only operating 
> > on
> > masked lanes?
> >
> > Is that what you see on GCN as well?
>
> The base is set to &a[100], as far as I can tell (hmm, I have an
> out-of-bounds error in my testcase .. no matter).

I see - I think that's the problem.  This shifts the effective index and thus
we can no longer rely on zero/sign extension producing the same effect.
We somehow have to avoid doing this.  vect_check_gather_scatter
produces (sizetype)&a as base and i_10 as offset (used as a[i_10])
and a scale of 4 for riscv.

But of course if we then use strided access we do not base of that base
it seems but we fold in the offset of the initial access.  If we'd
re-use the gather/scatter base and instead built an offset vector
like { byte-offset + 0, byte_offset + 1 * stride, byte_offset + 2* stride, .. }
we'd avoid the issue as we know all of the resulting offsets are positive.
That is, where we do vect_create_data_ref_ptr for this case we should base
that of the gather/scatter base - same as if doing a gather scatter.

Does that sound sensible?

Richard.

>
> Andrew
>
>
> >
> > Richard.
> >
> >> I have not observed issues in the other places, but I don't claim a full
> >> understanding of this code.
> >>
> >> Andrew
> >>
> >>
> >>>> This commit ensures that negative offsets are labelled as signed.
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>>           PR target/121393
> >>>>           * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Use 
> >>>> signed
> >>>>           offsets for negative steps.
> >>>> ---
> >>>>
> >>>> This fixes a problem observed on amdgcn, but there doesn't seem to be
> >>>> anything about it that is target-specific.
> >>>>
> >>>> OK for mainline?
> >>>>
> >>>> Andrew
> >>>>
> >>>>
> >>>>    gcc/tree-vect-stmts.cc | 8 ++++++++
> >>>>    1 file changed, 8 insertions(+)
> >>>>
> >>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>>> index cfc4f323a22..254fe6e6c0f 100644
> >>>> --- a/gcc/tree-vect-stmts.cc
> >>>> +++ b/gcc/tree-vect-stmts.cc
> >>>> @@ -2969,6 +2969,14 @@ vect_get_strided_load_store_ops (stmt_vec_info 
> >>>> stmt_info, slp_tree node,
> >>>>         type of the vector instead.  */
> >>>>      tree offset_type = TREE_TYPE (offset_vectype);
> >>>>
> >>>> +  if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST
> >>>> +      && compare_step_with_zero (loop_vinfo, stmt_info) < 0
> >>>> +      && TYPE_SIGN (offset_type) == UNSIGNED)
> >>>> +    {
> >>>> +      offset_type = signed_type_for (offset_type);
> >>>> +      offset_vectype = signed_type_for (offset_vectype);
> >>>> +    }
> >>>> +
> >>>>      /* Calculate X = DR_STEP / SCALE and convert it to the appropriate 
> >>>> type.  */
> >>>>      tree step = size_binop (EXACT_DIV_EXPR, unshare_expr (DR_STEP (dr)),
> >>>>                             ssize_int (SLP_TREE_GS_SCALE (node)));
> >>>> --
> >>>> 2.51.0
> >>>>
> >>
>

Reply via email to