Re: [PATCH] vect: Use signed offsets for negative step (PR121393)

Andrew Stubbs Thu, 02 Oct 2025 04:34:18 -0700

On 02/10/2025 11:52, Richard Biener wrote:

On Thu, Oct 2, 2025 at 11:14 AM Andrew Stubbs <[email protected]> wrote:


On 02/10/2025 10:58, Richard Biener wrote:

On Fri, Sep 26, 2025 at 1:27 PM Andrew Stubbs <[email protected]> wrote:


When testcases (such as libgomp.c-c++-common/for-11.c) use unsigned types for
loop counters then the vectorizer will also use unsigned types for the offsets
passed to gather/scatter instructions.  This is a problem if the loop step is
negative and the offset type is narrower than a pointer.


Hmm, but why did we chose an unsigned type in the first place then?  So,
isn't this an issue in vect_truncate_gather_scatter_offset or
vect_use_strided_gather_scatters_p?


Here's the reproducer from PR121393:

float a[100];

int f()
{
    unsigned int i;                  <----- unsigned comes from here

#pragma omp target simd
    for (i=100; i != 0; i-=2)
      a[i] = a[i] + 5.5;
}

int main()
{
    f();
    return 0;
}


The patch is not needed if "i" is either signed or long long.


So given other targets/code-paths are fine it seems like the gimple IL is
mis-interpreted somehow.  Checking risc-v (also using stride load/store)
does

   _27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [4, 4]);
   _8 = _27 * 18446744073709551608;
   vect__1.7_17 = .MASK_LEN_STRIDED_LOAD (vectp_a.5_14, -8, { 0.0, ...
}, { -1, ... }, _16(D), _27, 0);
   vect__2.8_19 = .COND_LEN_ADD ({ -1, ... }, vect__1.7_17, { 5.5e+0,
... }, _18(D), _27, 0);
   _1 = a[i_10];
   _2 = _1 + 5.5e+0;
   .MASK_LEN_STRIDED_STORE (vectp_a.9_22, -8, vect__2.8_19, { -1, ... }, _27, 
0);
   i_7 = i_10 + 4294967294;

and in vect_use_strided_gather_scatters_p we end up with unsigned vector([4,4].

Your patch only changes how VEC_SERIES_EXPR is handled, you
make that signed.  It seems to me that GCN eventually cannot do
a VnSImode series but you always build a VnDImode one and fail to
truncate?  Or you claim you can handle VnSImode offset vectors for
gather/scatter but in reality require sign-/zero-extended VnDImode?
Note IIRC the gather/scatter optabs handle signed vs. unsigned
(aka zero vs. sign extension) via predicates.  See
internal_gather_scatter_fn_supported_p.

The hardware instruction accepts only 32-bit unsigned offsets, but theaddresses are 64-bit so there's an implicit zero-extend hidden in there.Negative offsets do not work.

When the offsets are signed then the define_expand does the sign-extendand 64-bit add explicitly, and uses the hardware instruction that takesa vector of absolute addresses.

The problem in this case is that the middle-end is creating "unsigned"values that rely on 32-bit overflow to produce negative offsets, and itdoes not do the right thing.

I recently added an insn variant that allows DImode offsets (because theSPEC HPC lbm benchmark doesn't vectorize without, for some reason), andthat takes both signed and unsigned offsets and also does the offsetcalculations explicitly, but in this case the testcase uses "int" so theSImode variant is preferred, I think.

So with unsigned we expect the target to zero-extend.  Which might be
indeed an issue for negative step when we handle gather as
&a[100] + offset instead of &a[100 + index-offset].  The place you fix
is IMO still wrong.

I have been unable to fix this in the backend (if it says its unsignedthen we should treating it as such). I have spent a long time searchingthe vectorizer for a place where this "decision" is taken, and thisseems to be the spot.

If I change the gather/scatter such that it only accepts unsignedoffsets, would the middle-end adapt, or would it just give up? I do notwant the vectorizer to fail. We have too many places where performancedrops off a cliff because the vectorizer just says "no" already. :(

For risc-v we end up using &a as base and we have (sizetype)i * 4 as offset,
so we shouldn't get to problematic values and instead have a series like
{396, 392, ... 0, -4U, ... } with the negative values being only operating on
masked lanes?

Is that what you see on GCN as well?

The base is set to &a[100], as far as I can tell (hmm, I have anout-of-bounds error in my testcase .. no matter).


Andrew


Richard.

I have not observed issues in the other places, but I don't claim a full
understanding of this code.

Andrew

This commit ensures that negative offsets are labelled as signed.

gcc/ChangeLog:

          PR target/121393
          * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Use signed
          offsets for negative steps.
---

This fixes a problem observed on amdgcn, but there doesn't seem to be
anything about it that is target-specific.

OK for mainline?

Andrew


   gcc/tree-vect-stmts.cc | 8 ++++++++
   1 file changed, 8 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cfc4f323a22..254fe6e6c0f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2969,6 +2969,14 @@ vect_get_strided_load_store_ops (stmt_vec_info 
stmt_info, slp_tree node,
        type of the vector instead.  */
     tree offset_type = TREE_TYPE (offset_vectype);

+  if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST
+      && compare_step_with_zero (loop_vinfo, stmt_info) < 0
+      && TYPE_SIGN (offset_type) == UNSIGNED)
+    {
+      offset_type = signed_type_for (offset_type);
+      offset_vectype = signed_type_for (offset_vectype);
+    }
+
     /* Calculate X = DR_STEP / SCALE and convert it to the appropriate type.  
*/
     tree step = size_binop (EXACT_DIV_EXPR, unshare_expr (DR_STEP (dr)),
                            ssize_int (SLP_TREE_GS_SCALE (node)));
--
2.51.0

Re: [PATCH] vect: Use signed offsets for negative step (PR121393)

Reply via email to