On Wed, Jun 5, 2024 at 10:52 AM Li, Pan2 <pan2...@intel.com> wrote:
>
> Thanks for explaining. I see, cmove is well designed for such cases.

If the question is if it is worth it to convert using
__builtin_sub_overflow here if the target doesn't provide scalar
saturating optab, I think the answer is yes. For x86, the compare will
be eliminated.

Please consider this testcase:

--cut here--
unsigned int
__attribute__((noinline))
foo (unsigned int x, unsigned int y)
{
  return x > y ? x - y : 0;
}

unsigned int
__attribute__((noinline))
bar (unsigned int x, unsigned int y)
{
  unsigned int z;

  return __builtin_sub_overflow (x, y, &z) ? 0 : z;
}
--cut here--

This will compile to:

0000000000000000 <foo>:
  0:   89 f8                   mov    %edi,%eax
  2:   31 d2                   xor    %edx,%edx
  4:   29 f0                   sub    %esi,%eax
  6:   39 fe                   cmp    %edi,%esi
  8:   0f 43 c2                cmovae %edx,%eax
  b:   c3                      ret
  c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000000010 <bar>:
 10:   29 f7                   sub    %esi,%edi
 12:   72 03                   jb     17 <bar+0x7>
 14:   89 f8                   mov    %edi,%eax
 16:   c3                      ret
 17:   31 c0                   xor    %eax,%eax
 19:   c3                      ret

Please note that the compare was eliminated in the later test. So, if
the target does not provide saturated optab but provides
__builtin_sub_overflow, I think it is worth emitting .SAT_SUB via
__builtin_sub_overflow (and in similar way for saturated add).

Uros.


>
> Pan
>
> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: Wednesday, June 5, 2024 4:46 PM
> To: Li, Pan2 <pan2...@intel.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
> scalar int
>
> On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2 <pan2...@intel.com> wrote:
> >
> > > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > > version indeed can't be converted.
> >
> > > I will amend x86 testcases after the vector part of your patch is 
> > > committed.
> >
> > Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has 
> > sorts of forms, like a branch version as below.
> >
> > .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow 
> > here
> >
> > It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we 
> > can eliminate the branch here.
>
> x86 will emit cmove in the above case:
>
>        movl    %edi, %eax
>        xorl    %edx, %edx
>        subl    %esi, %eax
>        cmpl    %edi, %esi
>        cmovnb  %edx, %eax
>
> Maybe we can reuse flags from the subtraction here to avoid the compare.
>
> Uros.

Reply via email to