On 02/10/2025 11:52, Richard Biener wrote:
On Thu, Oct 2, 2025 at 11:14 AM Andrew Stubbs <[email protected]> wrote:

On 02/10/2025 10:58, Richard Biener wrote:
On Fri, Sep 26, 2025 at 1:27 PM Andrew Stubbs <[email protected]> wrote:

When testcases (such as libgomp.c-c++-common/for-11.c) use unsigned types for
loop counters then the vectorizer will also use unsigned types for the offsets
passed to gather/scatter instructions.  This is a problem if the loop step is
negative and the offset type is narrower than a pointer.

Hmm, but why did we chose an unsigned type in the first place then?  So,
isn't this an issue in vect_truncate_gather_scatter_offset or
vect_use_strided_gather_scatters_p?

Here's the reproducer from PR121393:

float a[100];

int f()
{
    unsigned int i;                  <----- unsigned comes from here

#pragma omp target simd
    for (i=100; i != 0; i-=2)
      a[i] = a[i] + 5.5;
}

int main()
{
    f();
    return 0;
}


The patch is not needed if "i" is either signed or long long.

So given other targets/code-paths are fine it seems like the gimple IL is
mis-interpreted somehow.  Checking risc-v (also using stride load/store)
does

   _27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [4, 4]);
   _8 = _27 * 18446744073709551608;
   vect__1.7_17 = .MASK_LEN_STRIDED_LOAD (vectp_a.5_14, -8, { 0.0, ...
}, { -1, ... }, _16(D), _27, 0);
   vect__2.8_19 = .COND_LEN_ADD ({ -1, ... }, vect__1.7_17, { 5.5e+0,
... }, _18(D), _27, 0);
   _1 = a[i_10];
   _2 = _1 + 5.5e+0;
   .MASK_LEN_STRIDED_STORE (vectp_a.9_22, -8, vect__2.8_19, { -1, ... }, _27, 
0);
   i_7 = i_10 + 4294967294;

and in vect_use_strided_gather_scatters_p we end up with unsigned vector([4,4].

Your patch only changes how VEC_SERIES_EXPR is handled, you
make that signed.  It seems to me that GCN eventually cannot do
a VnSImode series but you always build a VnDImode one and fail to
truncate?  Or you claim you can handle VnSImode offset vectors for
gather/scatter but in reality require sign-/zero-extended VnDImode?
Note IIRC the gather/scatter optabs handle signed vs. unsigned
(aka zero vs. sign extension) via predicates.  See
internal_gather_scatter_fn_supported_p.

The hardware instruction accepts only 32-bit unsigned offsets, but the addresses are 64-bit so there's an implicit zero-extend hidden in there. Negative offsets do not work.

When the offsets are signed then the define_expand does the sign-extend and 64-bit add explicitly, and uses the hardware instruction that takes a vector of absolute addresses.

The problem in this case is that the middle-end is creating "unsigned" values that rely on 32-bit overflow to produce negative offsets, and it does not do the right thing.

I recently added an insn variant that allows DImode offsets (because the SPEC HPC lbm benchmark doesn't vectorize without, for some reason), and that takes both signed and unsigned offsets and also does the offset calculations explicitly, but in this case the testcase uses "int" so the SImode variant is preferred, I think.

So with unsigned we expect the target to zero-extend.  Which might be
indeed an issue for negative step when we handle gather as
&a[100] + offset instead of &a[100 + index-offset].  The place you fix
is IMO still wrong.

I have been unable to fix this in the backend (if it says its unsigned then we should treating it as such). I have spent a long time searching the vectorizer for a place where this "decision" is taken, and this seems to be the spot.

If I change the gather/scatter such that it only accepts unsigned offsets, would the middle-end adapt, or would it just give up? I do not want the vectorizer to fail. We have too many places where performance drops off a cliff because the vectorizer just says "no" already. :(

For risc-v we end up using &a as base and we have (sizetype)i * 4 as offset,
so we shouldn't get to problematic values and instead have a series like
{396, 392, ... 0, -4U, ... } with the negative values being only operating on
masked lanes?

Is that what you see on GCN as well?

The base is set to &a[100], as far as I can tell (hmm, I have an out-of-bounds error in my testcase .. no matter).

Andrew



Richard.

I have not observed issues in the other places, but I don't claim a full
understanding of this code.

Andrew


This commit ensures that negative offsets are labelled as signed.

gcc/ChangeLog:

          PR target/121393
          * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Use signed
          offsets for negative steps.
---

This fixes a problem observed on amdgcn, but there doesn't seem to be
anything about it that is target-specific.

OK for mainline?

Andrew


   gcc/tree-vect-stmts.cc | 8 ++++++++
   1 file changed, 8 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cfc4f323a22..254fe6e6c0f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2969,6 +2969,14 @@ vect_get_strided_load_store_ops (stmt_vec_info 
stmt_info, slp_tree node,
        type of the vector instead.  */
     tree offset_type = TREE_TYPE (offset_vectype);

+  if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST
+      && compare_step_with_zero (loop_vinfo, stmt_info) < 0
+      && TYPE_SIGN (offset_type) == UNSIGNED)
+    {
+      offset_type = signed_type_for (offset_type);
+      offset_vectype = signed_type_for (offset_vectype);
+    }
+
     /* Calculate X = DR_STEP / SCALE and convert it to the appropriate type.  
*/
     tree step = size_binop (EXACT_DIV_EXPR, unshare_expr (DR_STEP (dr)),
                            ssize_int (SLP_TREE_GS_SCALE (node)));
--
2.51.0



Reply via email to