Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>> On 15 Nov 2024, at 10:20, Andrew Pinski <pins...@gmail.com> wrote:
>> 
>> On Thu, Nov 14, 2024 at 7:50 PM Soumya AR <soum...@nvidia.com> wrote:
>>> 
>>> The SVE SUBR instruction performs a reversed subtract from an immediate.
>>> 
>>> This patches enables the emission of SUBR for Neon modes and avoids the 
>>> need to
>>> materialise an explicit constant.
>>> 
>>> For example, the below test case:
>>> 
>>> typedef long long __attribute__ ((vector_size (16))) v2di;
>>> 
>>> v2di subr_v2di (v2di x)
>>> {
>>>        return 15 - x;
>>> }
>>> 
>>> compiles to:
>>> 
>>> subr_v2di:
>>>        mov     z31.d, #15
>>>        sub     v0.2d, v31.2d, v0.2d
>>>        ret
>>> 
>>> but can just be:
>>> 
>>> subr_v2di:
>>>        subr    z0.d, z0.d, #15
>>>        ret
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Soumya AR <soum...@nvidia.com>
>>> 
>>> gcc/ChangeLog:
>>> 
>>>        * config/aarch64/aarch64-simd.md:
>>>        (sub<mode>3<vczle><vczbe>): Extended the pattern to emit SUBR for SVE
>>>        targets if operand 1 is an immediate.
>>>        * config/aarch64/predicates.md 
>>> (aarch64_sve_arith_imm_or_reg_operand):
>>>        New predicate that accepts aarch64_sve_arith_immediate in operand 1 
>>> but
>>>        only for TARGET_SVE.
>> 
>> I think this might cause wrong code with:
>> ```
>> #include <arm_neon.h>
>> uint32x4_t foo_sub_u32 (uint32x2_t a, uint32x2_t b)
>> {
>>  uint32x2_t zeros = vcreate_u32 (0);
>>  b = vdup_n_u32 (15);
>>  return vcombine_u32 (vsub_u32 (b, a), zeros);
>> }
>> ```
>> As now the elements that are supposed to be zero are now `15-x`.
>> This is due to the `<vczle><vczbe>` part of the pattern name.
>
> I think you’re right. This trick is only valid for 128-bit Advanced SIMD 
> modes.
> Maybe the way to go here is to not touch sub<mode>3<vczle><vczbe> and instead
> add a new sub pattern for VQ_I modes only that accepts 
> aarch64_sve_arith_immediate.

It should be ok for 64-bit modes too, since we don't maintain the
invariant that all 64-bit modes are zero-extended on read.  It's just
the automatic addition of the combined zero-extension that Andrew mentioned
that's the problem.

It would be good if we could make the define_subst result automatically
disable "sve" alternatives.  I don't know whether that's possible as
things stand (too much stage 1 stuff!) but it semes like a useful thing
to have, and would be more robust than trying to remember about this
every time we add an SVE variant.

IMO we might as well handle SUB at the same time as SUBR.

Thanks,
Richard

Reply via email to