On Mon, May 23, 2022 at 9:40 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> It's not uncommon for GCC to convert between a (zero or one) Boolean
> value and a (zero or all ones) mask value, possibly of a wider type,
> using negation.
>
> Currently on x86_64, the following simple test case:
> __int128 foo(unsigned long x) { return -(__int128)x; }
>
> compiles with -O2 to:
>
>         movq    %rdi, %rax
>         xorl    %edx, %edx
>         negq    %rax
>         adcq    $0, %rdx
>         negq    %rdx
>         ret
>
> with this patch, which adds an additional peephole2 to i386.md,
> we instead generate the improved:
>
>         movq    %rdi, %rax
>         negq    %rax
>         sbbq    %rdx, %rdx
>         ret
>
> [and likewise for the (DImode) long long version using -m32.]
> A peephole2 is appropriate as the double word negation and the
> operation providing the xor are typically only split after combine.
>
> In fact, the new peephole2 sequence:
> ;; Convert:
> ;;   xorl %edx, %edx
> ;;   negl %eax
> ;;   adcl $0, %edx
> ;;   negl %edx
> ;; to:
> ;;   negl %eax
> ;;   sbbl %edx, %edx    // *x86_mov<mode>cc_0_m1
>
> is nearly identical to (and placed immediately after) the existing:
> ;; Convert:
> ;;   mov %esi, %edx
> ;;   negl %eax
> ;;   adcl $0, %edx
> ;;   negl %edx
> ;; to:
> ;;   xorl %edx, %edx
> ;;   negl %eax
> ;;   sbbl %esi, %edx
>
>
> One potential objection/concern is that "sbb? %reg,%reg" may possibly be
> incorrectly perceived as a false register dependency on older hardware,
> much like "xor? %reg,%reg" may be perceived as a false dependency on
> really old hardware.  This doesn't currently appear to be a concern
> for the i386 backend's *x86_move<mode>cc_0_m1 as shown by the following
> test code:
>
> int bar(unsigned int x, unsigned int y) {
>   return x > y ? -1 : 0;
> }
>
> which currently generates a "naked" sbb:
>         cmp     esi, edi
>         sbb     eax, eax
>         ret
>
> If anyone does potentially encounter a stall, it would easy to add
> a splitter or peephole2 controlled by a tuning flag to insert an additional
> xor to break the false dependency chain (when not optimizing for size),
> but I don't believe this is required on recent microarchitectures.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures.  Ok for mainline?
>
>
> 2022-05-23  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (peephole2): Convert xor;neg;adc;neg,
>         i.e. a double word negation of a zero extended operand, to
>         neg;sbb.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/neg-zext-1.c: New test case for ia32.
>         * gcc.target/i386/neg-zext-2.c: New test case for int128.

[1] suggests that sbb reg,reg breaks dependency chain only for AMD
targets (bulldozer/zen/bobcat/jaguar) and not for Intel. However, we
use "naked" sbb extensively and nobody complained about it, so I guess
the patch is OK.

[1] https://reviews.llvm.org/D116804

Thanks,
Uros.

>
> Thanks in advance,
> Roger
> --
>

Reply via email to