On Sun, Jun 4, 2023 at 12:45 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch is the latest revision of my patch to add support for the > STC (set carry flag), CLC (clear carry flag) and CMC (complement > carry flag) instructions to the i386 backend, incorporating Uros' > previous feedback. The significant changes are (i) the inclusion > of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new > X86_TUNE_SLOW_STC tuning flag to use alternate implementations on > pentium4 (which has a notoriously slow STC) when not optimizing > for size. > > An example of the use of the stc instruction is: > unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) { > return __builtin_ia32_addcarryx_u32 (1, a, b, c); > } > > which previously generated: > movl $1, %eax > addb $-1, %al > adcl %esi, %edi > setc %al > movl %edi, (%rdx) > movzbl %al, %eax > ret > > with this patch now generates: > stc > adcl %esi, %edi > setc %al > movl %edi, (%rdx) > movzbl %al, %eax > ret > > An example of the use of the cmc instruction (where the carry from > a first adc is inverted/complemented as input to a second adc) is: > unsigned int bar (unsigned int a, unsigned int b, > unsigned int c, unsigned int d) > { > unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); > return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2); > } > > which previously generated: > movl $1, %eax > addb $-1, %al > adcl %esi, %edi > setnc %al > movl %edi, o1(%rip) > addb $-1, %al > adcl %ecx, %edx > setc %al > movl %edx, o2(%rip) > movzbl %al, %eax > ret > > and now generates: > stc > adcl %esi, %edi > cmc > movl %edi, o1(%rip) > adcl %ecx, %edx > setc %al > movl %edx, o2(%rip) > movzbl %al, %eax > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2022-06-03 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>: > Use new x86_stc or negqi_ccc_1 instructions to set the carry flag. > * config/i386/i386.h (TARGET_SLOW_STC): New define. > * config/i386/i386.md (UNSPEC_CLC): New UNSPEC for clc. > (UNSPEC_STC): New UNSPEC for stc. > (UNSPEC_CMC): New UNSPEC for cmc. > (*x86_clc): New define_insn. > (*x86_clc_xor): New define_insn for pentium4 without -Os. > (x86_stc): New define_insn. > (define_split): Convert x86_stc into alternate implementation > on pentium4. > (x86_cmc): New define_insn. > (*x86_cmc_1): New define_insn_and_split to recognize cmc pattern. > (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to > recognize (and eliminate) the carry flag being copied to itself. > (*setcc_qi_negqi_ccc_2_<mode>): Likewise. > (neg<mode>_ccc_1): Renamed from *neg<mode>_ccc_1 for gen function. > * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag. > > gcc/testsuite/ChangeLog > * gcc.target/i386/cmc-1.c: New test case. > * gcc.target/i386/stc-1.c: Likewise.
+;; Clear carry flag. +(define_insn "*x86_clc" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC))] + "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)" + "clc" + [(set_attr "length" "1") + (set_attr "length_immediate" "0") + (set_attr "modrm" "0")]) + +(define_insn "*x86_clc_xor" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC)) + (clobber (match_scratch:SI 0 "=r"))] + "TARGET_SLOW_STC && !optimize_function_for_size_p (cfun)" + "xor{l}\t%0, %0" + [(set_attr "type" "alu1") + (set_attr "mode" "SI") + (set_attr "length_immediate" "0")]) I think the above would be better implemented as a peephole2 pattern that triggers when a register is available. We should not waste a register on a register starved x86_32 just to set a carry flag. This is implemented with: [(match_scratch:SI 2 "r") at the beginning of the peephole2 pattern that generates x86_clc_xor. The pattern should be constrained with "TARGET_SLOW_STC && !optimize_function_for_size_p()" and x86_clc_xor should be available only after reload (like e.g. "*mov<mode>_xor"). +;; On Pentium 4, set the carry flag using mov $1,%al;neg %al. +(define_split + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_STC))] + "TARGET_SLOW_STC + && !optimize_insn_for_size_p () + && can_create_pseudo_p ()" + [(set (match_dup 0) (const_int 1)) + (parallel + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 0) (const_int 0)] UNSPEC_CC_NE)) + (set (match_dup 0) (neg:QI (match_dup 0)))])] + "operands[0] = gen_reg_rtx (QImode);") Same here, this should be a peephole2 to trigger only when a register is available, again for "TARGET_SLOW_STC && !optimize_function_for_size_p()". Is the new sequence "mov $1,%al; neg %al" any better than existing "mov $1,%al; addb $-1,%al" or is there another reason for the changed sequence? +(define_insn_and_split "*x86_cmc_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(geu:QI (reg:CCC FLAGS_REG) (const_int 0)) + (const_int 0)] UNSPEC_CC_NE))] + "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) (unspec:CCC [(reg:CCC FLAGS_REG)] UNSPEC_CMC))]) The above should be the RTL model of x86_cmc. No need to split to an unspec. Uros.