> The hardware instruction accepts only 32-bit unsigned offsets, but the 
> addresses are 64-bit so there's an implicit zero-extend hidden in there. 
> Negative offsets do not work.
>
> When the offsets are signed then the define_expand does the sign-extend 
> and 64-bit add explicitly, and uses the hardware instruction that takes 
> a vector of absolute addresses.
>
> The problem in this case is that the middle-end is creating "unsigned" 
> values that rely on 32-bit overflow to produce negative offsets, and it 
> does not do the right thing.
>
> I recently added an insn variant that allows DImode offsets (because the 
> SPEC HPC lbm benchmark doesn't vectorize without, for some reason), and 
> that takes both signed and unsigned offsets and also does the offset 
> calculations explicitly, but in this case the testcase uses "int" so the 
> SImode variant is preferred, I think.
>
>> So with unsigned we expect the target to zero-extend.  Which might be
>> indeed an issue for negative step when we handle gather as
>> &a[100] + offset instead of &a[100 + index-offset].  The place you fix
>> is IMO still wrong.
>
> I have been unable to fix this in the backend (if it says its unsigned 
> then we should treating it as such). I have spent a long time searching 
> the vectorizer for a place where this "decision" is taken, and this 
> seems to be the spot.
>
> If I change the gather/scatter such that it only accepts unsigned 
> offsets, would the middle-end adapt, or would it just give up? I do not 
> want the vectorizer to fail. We have too many places where performance 
> drops off a cliff because the vectorizer just says "no" already. :(

You mean changing your target's predicate for gather/scatter?  Right now
the vectorizer will give up but I'm working on something so it won't.
We have the same issue on riscv in that the HW zero extends but we accept 
everything in the predicate and handle the extension/scale ourselves.

>
>> For risc-v we end up using &a as base and we have (sizetype)i * 4 as offset,
>> so we shouldn't get to problematic values and instead have a series like
>> {396, 392, ... 0, -4U, ... } with the negative values being only operating on
>> masked lanes?
>> 
>> Is that what you see on GCN as well?
>
> The base is set to &a[100], as far as I can tell (hmm, I have an 
> out-of-bounds error in my testcase .. no matter).
>
> Andrew
>

Just to add a data-point but without having looked at code-gen yet, riscv 
without the strided optimization, i.e. using regular gather, also fails.  
There's no offset truncation happening either.

  _27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [16, 16]);
  _8 = _27 * 18446744073709551608;
  vect__1.7_17 = .MASK_LEN_GATHER_LOAD (vectp_a.5_14, 64B, { 0, 4294967294, 
4294967292, ... }, 4, { 0.0, ... }, { -1, ... }, _16(D), _27, 0);
  vect__2.8_19 = .COND_LEN_ADD ({ -1, ... }, vect__1.7_17, { 5.5e+0, ... }, 
_18(D), _27, 0);
  .MASK_LEN_SCATTER_STORE (vectp_a.9_22, 64B, { 0, 4294967294, 4294967292, ... 
}, 4, vect__2.8_19, { -1, ... }, _27, 0);

I'll hope to have another more detailed look soon.

-- 
Regards
 Robin

Reply via email to