On Mon, May 23, 2022 at 9:40 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > It's not uncommon for GCC to convert between a (zero or one) Boolean > value and a (zero or all ones) mask value, possibly of a wider type, > using negation. > > Currently on x86_64, the following simple test case: > __int128 foo(unsigned long x) { return -(__int128)x; } > > compiles with -O2 to: > > movq %rdi, %rax > xorl %edx, %edx > negq %rax > adcq $0, %rdx > negq %rdx > ret > > with this patch, which adds an additional peephole2 to i386.md, > we instead generate the improved: > > movq %rdi, %rax > negq %rax > sbbq %rdx, %rdx > ret > > [and likewise for the (DImode) long long version using -m32.] > A peephole2 is appropriate as the double word negation and the > operation providing the xor are typically only split after combine. > > In fact, the new peephole2 sequence: > ;; Convert: > ;; xorl %edx, %edx > ;; negl %eax > ;; adcl $0, %edx > ;; negl %edx > ;; to: > ;; negl %eax > ;; sbbl %edx, %edx // *x86_mov<mode>cc_0_m1 > > is nearly identical to (and placed immediately after) the existing: > ;; Convert: > ;; mov %esi, %edx > ;; negl %eax > ;; adcl $0, %edx > ;; negl %edx > ;; to: > ;; xorl %edx, %edx > ;; negl %eax > ;; sbbl %esi, %edx > > > One potential objection/concern is that "sbb? %reg,%reg" may possibly be > incorrectly perceived as a false register dependency on older hardware, > much like "xor? %reg,%reg" may be perceived as a false dependency on > really old hardware. This doesn't currently appear to be a concern > for the i386 backend's *x86_move<mode>cc_0_m1 as shown by the following > test code: > > int bar(unsigned int x, unsigned int y) { > return x > y ? -1 : 0; > } > > which currently generates a "naked" sbb: > cmp esi, edi > sbb eax, eax > ret > > If anyone does potentially encounter a stall, it would easy to add > a splitter or peephole2 controlled by a tuning flag to insert an additional > xor to break the false dependency chain (when not optimizing for size), > but I don't believe this is required on recent microarchitectures. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, with > no new failures. Ok for mainline? > > > 2022-05-23 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (peephole2): Convert xor;neg;adc;neg, > i.e. a double word negation of a zero extended operand, to > neg;sbb. > > gcc/testsuite/ChangeLog > * gcc.target/i386/neg-zext-1.c: New test case for ia32. > * gcc.target/i386/neg-zext-2.c: New test case for int128.
[1] suggests that sbb reg,reg breaks dependency chain only for AMD targets (bulldozer/zen/bobcat/jaguar) and not for Intel. However, we use "naked" sbb extensively and nobody complained about it, so I guess the patch is OK. [1] https://reviews.llvm.org/D116804 Thanks, Uros. > > Thanks in advance, > Roger > -- >