Akram Ahmad <akram.ah...@arm.com> writes:
> Ah whoops- I didn't see this before sending off V4 just now, my apologies.
> I'll try my best to get this implemented before the end of the day so that
> it doesn't miss the deadline.

No rush!  The delay here is entirely my fault, so no problem if the
patch lands early stage 4.

> On 09/01/2025 23:04, Richard Sandiford wrote:
>> Akram Ahmad <akram.ah...@arm.com> writes:
>>> In the above example, subtraction replaces the adds with subs and the
>>> csinv with csel. The 32-bit case follows the same approach. Arithmetic
>>> with a constant operand is simplified further by directly storing the
>>> saturating limit in the temporary register, resulting in only three
>>> instructions being used. It is important to note that this only works
>>> when early-ra is disabled due to an early-ra bug which erroneously
>>> assigns FP registers to the operands; if early-ra is enabled, then the
>>> original behaviour (NEON instruction) occurs.
>> This can be fixed by changing:
>>
>>      case CT_REGISTER:
>>        if (REG_P (op) || SUBREG_P (op))
>>          return true;
>>        break;
>>
>> to:
>>
>>      case CT_REGISTER:
>>        if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH)
>>          return true;
>>        break;
>>
>> But I can test & post that as a follow-up if you prefer.
> Yes please, if that's not too much trouble- would that have to go into
> another patch?

Yeah.  But early-ra pessimisations are regressions, since early-ra was
new to GCC 14.  So that can go in during stage 4 as well.

>>> +
>>>   ;; Double vector modes.
>>>   (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF])
>>>   
>>> diff --git 
>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>>  
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> new file mode 100644
>>> index 00000000000..2b72be7b0d7
>>> --- /dev/null
>>> +++ 
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> @@ -0,0 +1,79 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
>>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>> +
>>> +/*
>>> +** uadd_lane: { xfail *-*-* }
>>> +** dup\tv([0-9]+).8b, w0
>>> +** uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1)
>>> +** umov\tw0, v\2.b\[0\]
>>> +** ret
>>> +*/
>> Whats the reason behind the xfail?  Is it the early-ra thing, or
>> something else?  (You might already have covered this, sorry.)
>>
>> xfailing is fine if it needs further optimisation, was just curious :)
> This is because of a missing pattern in match.pd (I've sent another 
> patch upstream
> to add the missing pattern, although it may have gotten lost). Once that 
> pattern is
> added though, this should be recognised as .SAT_SUB, and the new 
> instructions will
> appear.

Ah, great!

Thanks,
Richard

Reply via email to