Re: [PATCH v3 1/2] aarch64: Add SVE2 faminmax intrinsics

Richard Sandiford Tue, 08 Oct 2024 07:27:44 -0700

Saurabh Jha <saurabh....@arm.com> writes:
> Thanks for the review. Wanted to clarify your comment:
>
> On 10/8/2024 11:51 AM, Richard Sandiford wrote:
>> <saurabh....@arm.com> writes:
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
>>> new file mode 100644
>>> index 00000000000..de4a6f8efaa
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
>>> @@ -0,0 +1,312 @@
>>> [...]
>>> +/*
>>> +** amax_h4_f16_m_untied:
>>> +** mov     (z[0-9]+\.h), h4
>>> +** movprfx z0, z1
>>> +** famax   z0\.h, p0/m, z0\.h, \1
>>> +** ret
>>> +*/
>>> +TEST_UNIFORM_ZD (amax_h4_f16_m_untied, svfloat16_t, __fp16,
>>> +            z0 = svamax_n_f16_m (p0, z1, d4),
>>> +            z0 = svamax_m (p0, z1, d4))
>>> +
>>> +/*
>>> +** amax_2_f16_m:
>>> +** fmov    (z[0-9]+\.h), #2\.0(?:e\+0)?
>>> +** famax   z0\.h, p0/m, z0\.h, \1
>>> +** ret
>>> +*/
>>> +TEST_UNIFORM_Z (amax_2_f16_m, svfloat16_t,
>>> +           z0 = svamax_n_f16_m (p0, z0, 2),
>>> +           z0 = svamax_m (p0, z0, 2))
>> 
>> Rather than dropping the tests for 0 and 1, I think we should keep
>> them and verify that we don't try to use non-existent immediate forms.
>> (It would be easy for that to happen, if they were added to the wrong
>> iterators.)
>> 
>> Maybe:
>> 
>> /*
>> ** amax_0_f16_m_tied1:
>> **   ...
>> **   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
>> **   ret
>> */
>> TEST_UNIFORM_Z (amax_0_f16_m_tied1, svfloat16_t,
>>              z0 = svamax_n_f16_m (p0, z0, 0),
>>              z0 = svamax_m (p0, z0, 0))
>> 
>> /*
>> ** amax_1_f16_m_tied1:
>> **   ...
>> **   famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
>> **   ret
>> */
>> TEST_UNIFORM_Z (amax_1_f16_m_tied1, svfloat16_t,
>>              z0 = svamax_n_f16_m (p0, z0, 1),
>>              z0 = svamax_m (p0, z0, 1))
>> 
>> (untested).  Similarly for the other files, and for _z and _x.
>
> Right now, if we do this, we get an ICE like this
>
> unrecognizable insn:
> (insn 18 17 19 2 (set (reg:VNx8HF 107)
>          (unspec:VNx8HF [
>                  (reg:VNx8BI 108)
>                  (unspec:VNx8HF [
>                          (reg:VNx8BI 108)
>                          (const_int 1 [0x1])
>                          (reg:VNx8HF 109)
>                          (const_vector:VNx8HF repeat [
>                                  (const_double:HF 0.0 [0x0.0p+0])
>                              ])
>                      ] UNSPEC_COND_FAMAX)
>                  (reg:VNx8HF 109)
>              ] UNSPEC_SEL)) 
> "/home/saujha01/gnu-work/src/gcc/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c":54:1
>  
> -1
>       (nil))
> during RTL pass: vregs


Ah, yeah, that's the kind of failure that the tests above would
defend against.

> Most likely, ICE is not the right answer here. But if we want to have a 
> test for immediate forms, we would have to add support for it in 
> instruction patterns, isn't it? But these functions don't expect 
> immediate operands.
>
> Should we add support for immediate operands in the instruction patterns 
> or should we somehow write test so that we make sure we don't 
> inadvertently add support for immediate forms? I think you meant the 
> latter but not sure how could we add that test without adding support 
> for it in the compiler.

Yeah, I meant the latter.  From a user's point of view, this is a
similar sort of situation to:

/*
** amax_2_f32_m:
**      fmov    (z[0-9]+\.s), #2\.0(?:e\+0)?
**      famax   z0\.s, p0/m, z0\.s, \1
**      ret
*/
TEST_UNIFORM_Z (amax_2_f32_m, svfloat32_t,
                z0 = svamax_n_f32_m (p0, z0, 2),
                z0 = svamax_m (p0, z0, 2))

which we do handle correctly.  The idea is the same for 0 and 1:
we should force the constant into a register and use a pure
register FAMIN/FAMAX.

(In the tests above, I dropped matching the fmov part, because being
overly specific about ways of moving zeros into registers forced Tamar
to do a lot of make-work for one of his recent patches.)

I think at least one of the bugs is:

 (define_int_attr sve_pred_fp_rhs2_operand
   [(UNSPEC_COND_FADD "aarch64_sve_float_arith_with_sub_operand")
+   (UNSPEC_COND_FAMAX "aarch64_sve_float_maxmin_operand")
+   (UNSPEC_COND_FAMIN "aarch64_sve_float_maxmin_operand")
    (UNSPEC_COND_FDIV "register_operand")
    (UNSPEC_COND_FMAX "aarch64_sve_float_maxmin_operand")
    (UNSPEC_COND_FMAXNM "aarch64_sve_float_maxmin_operand")

this should use register_operand rather than aarch64_sve_float_maxmin_operand,
since FAMIN and FAMAX don't have the same immediate forms as FMIN and FMAX.

Thanks,
Richard

Re: [PATCH v3 1/2] aarch64: Add SVE2 faminmax intrinsics

Reply via email to