On Wed, Jun 5, 2024 at 10:52 AM Li, Pan2 <[email protected]> wrote:
>
> Thanks for explaining. I see, cmove is well designed for such cases.
If the question is if it is worth it to convert using
__builtin_sub_overflow here if the target doesn't provide scalar
saturating optab, I think the answer is yes. For x86, the compare will
be eliminated.
Please consider this testcase:
--cut here--
unsigned int
__attribute__((noinline))
foo (unsigned int x, unsigned int y)
{
return x > y ? x - y : 0;
}
unsigned int
__attribute__((noinline))
bar (unsigned int x, unsigned int y)
{
unsigned int z;
return __builtin_sub_overflow (x, y, &z) ? 0 : z;
}
--cut here--
This will compile to:
0000000000000000 <foo>:
0: 89 f8 mov %edi,%eax
2: 31 d2 xor %edx,%edx
4: 29 f0 sub %esi,%eax
6: 39 fe cmp %edi,%esi
8: 0f 43 c2 cmovae %edx,%eax
b: c3 ret
c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000010 <bar>:
10: 29 f7 sub %esi,%edi
12: 72 03 jb 17 <bar+0x7>
14: 89 f8 mov %edi,%eax
16: c3 ret
17: 31 c0 xor %eax,%eax
19: c3 ret
Please note that the compare was eliminated in the later test. So, if
the target does not provide saturated optab but provides
__builtin_sub_overflow, I think it is worth emitting .SAT_SUB via
__builtin_sub_overflow (and in similar way for saturated add).
Uros.
>
> Pan
>
> -----Original Message-----
> From: Uros Bizjak <[email protected]>
> Sent: Wednesday, June 5, 2024 4:46 PM
> To: Li, Pan2 <[email protected]>
> Cc: Richard Biener <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned
> scalar int
>
> On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2 <[email protected]> wrote:
> >
> > > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > > version indeed can't be converted.
> >
> > > I will amend x86 testcases after the vector part of your patch is
> > > committed.
> >
> > Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has
> > sorts of forms, like a branch version as below.
> >
> > .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow
> > here
> >
> > It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we
> > can eliminate the branch here.
>
> x86 will emit cmove in the above case:
>
> movl %edi, %eax
> xorl %edx, %edx
> subl %esi, %eax
> cmpl %edi, %esi
> cmovnb %edx, %eax
>
> Maybe we can reuse flags from the subtraction here to avoid the compare.
>
> Uros.