On Wed, Jun 5, 2024 at 10:52 AM Li, Pan2 <pan2...@intel.com> wrote: > > Thanks for explaining. I see, cmove is well designed for such cases.
If the question is if it is worth it to convert using __builtin_sub_overflow here if the target doesn't provide scalar saturating optab, I think the answer is yes. For x86, the compare will be eliminated. Please consider this testcase: --cut here-- unsigned int __attribute__((noinline)) foo (unsigned int x, unsigned int y) { return x > y ? x - y : 0; } unsigned int __attribute__((noinline)) bar (unsigned int x, unsigned int y) { unsigned int z; return __builtin_sub_overflow (x, y, &z) ? 0 : z; } --cut here-- This will compile to: 0000000000000000 <foo>: 0: 89 f8 mov %edi,%eax 2: 31 d2 xor %edx,%edx 4: 29 f0 sub %esi,%eax 6: 39 fe cmp %edi,%esi 8: 0f 43 c2 cmovae %edx,%eax b: c3 ret c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000000010 <bar>: 10: 29 f7 sub %esi,%edi 12: 72 03 jb 17 <bar+0x7> 14: 89 f8 mov %edi,%eax 16: c3 ret 17: 31 c0 xor %eax,%eax 19: c3 ret Please note that the compare was eliminated in the later test. So, if the target does not provide saturated optab but provides __builtin_sub_overflow, I think it is worth emitting .SAT_SUB via __builtin_sub_overflow (and in similar way for saturated add). Uros. > > Pan > > -----Original Message----- > From: Uros Bizjak <ubiz...@gmail.com> > Sent: Wednesday, June 5, 2024 4:46 PM > To: Li, Pan2 <pan2...@intel.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org; > juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com > Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned > scalar int > > On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2 <pan2...@intel.com> wrote: > > > > > I see. x86 doesn't have scalar saturating instructions, so the scalar > > > version indeed can't be converted. > > > > > I will amend x86 testcases after the vector part of your patch is > > > committed. > > > > Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has > > sorts of forms, like a branch version as below. > > > > .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow > > here > > > > It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we > > can eliminate the branch here. > > x86 will emit cmove in the above case: > > movl %edi, %eax > xorl %edx, %edx > subl %esi, %eax > cmpl %edi, %esi > cmovnb %edx, %eax > > Maybe we can reuse flags from the subtraction here to avoid the compare. > > Uros.