Akram Ahmad <akram.ah...@arm.com> writes:
> On 23/10/2024 12:20, Richard Sandiford wrote:
>> Thanks for doing this.  The approach looks good.  My main question is:
>> are we sure that we want to use the Advanced SIMD instructions for
>> signed saturating SI and DI arithmetic on GPRs?  E.g. for addition,
>> we only saturate at the negative limit if both operands are negative,
>> and only saturate at the positive limit if both operands are positive.
>> So for 32-bit values we can use:
>>
>>      asr     tmp, x or y, #31
>>      eor     tmp, tmp, #0x80000000
>>
>> to calculate the saturation value and:
>>
>>      adds    res, x, y
>>      csel    res, tmp, res, vs
>>
>> to calculate the full result.  That's the same number of instructions
>> as two fmovs for the inputs, the sqadd, and the fmov for the result,
>> but it should be more efficient.
>>
>> The reason for asking now, rather than treating it as a potential
>> future improvement, is that it would also avoid splitting the patterns
>> for signed and unsigned ops.  (The length of the split alternative can be
>> conservatively set to 16 even for the unsigned version, since nothing
>> should care in practice.  The split will have happened before
>> shorten_branches.)
>
> Hi Richard, thanks for looking over this.
>
> I might be misunderstanding your suggestion, but is there a way to
> efficiently check the signedness of the second operand (let's say 'y')
> if it is stored in a register? This is a problem we considered and
> couldn't solve post-reload, as we only have three registers (including
> two operands) to work with. (I might be wrong in terms of how many
> registers we have available). AFAIK that's why we only use adds, csinv
> / subs, csel in the unsigned case.

Ah, ok.  For post-reload splits, we would need to add:

  (clobber (match_operand:GPI 3 "scratch_operand"))

then use "X" as the constraint for the Advanced SIMD alternative and
"&r" as the constraint in the GPR alternative.  But I suppose that
also sinks my dream of a unified pattern, since the unsigned case
wouldn't need the extra operand.

In both cases (signed and unsigned), the pattern should clobber CC_REGNUM,
since the split changes the flags.

> [...]
>>> diff --git 
>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>>  
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> new file mode 100644
>>> index 00000000000..63eb21e438b
>>> --- /dev/null
>>> +++ 
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> @@ -0,0 +1,79 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
>>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>> +
>>> +/*
>>> +** uadd_lane: { xfail *-*-* }
>> Just curious: why does this fail?  Is it a vector costing issue?
> This is due to a missing pattern from match.pd- I've sent another patch
> upstream to rectify this. In essence, this function exposes a commutative
> form of an existing addition pattern, but that form isn't currently 
> commutative
> when it should be. It's a similar reason for why the uqsubs are also 
> marked as
> xfail, so that same patch series contains a fix for the uqsub case too.

Ah, ok, thanks.

Richard

Reply via email to