On Sun, Jun 4, 2023 at 12:45 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch is the latest revision of my patch to add support for the
> STC (set carry flag), CLC (clear carry flag) and CMC (complement
> carry flag) instructions to the i386 backend, incorporating Uros'
> previous feedback.  The significant changes are (i) the inclusion
> of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new
> X86_TUNE_SLOW_STC tuning flag to use alternate implementations on
> pentium4 (which has a notoriously slow STC) when not optimizing
> for size.
>
> An example of the use of the stc instruction is:
> unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
>   return __builtin_ia32_addcarryx_u32 (1, a, b, c);
> }
>
> which previously generated:
>         movl    $1, %eax
>         addb    $-1, %al
>         adcl    %esi, %edi
>         setc    %al
>         movl    %edi, (%rdx)
>         movzbl  %al, %eax
>         ret
>
> with this patch now generates:
>         stc
>         adcl    %esi, %edi
>         setc    %al
>         movl    %edi, (%rdx)
>         movzbl  %al, %eax
>         ret
>
> An example of the use of the cmc instruction (where the carry from
> a first adc is inverted/complemented as input to a second adc) is:
> unsigned int bar (unsigned int a, unsigned int b,
>                   unsigned int c, unsigned int d)
> {
>   unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
>   return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
> }
>
> which previously generated:
>         movl    $1, %eax
>         addb    $-1, %al
>         adcl    %esi, %edi
>         setnc   %al
>         movl    %edi, o1(%rip)
>         addb    $-1, %al
>         adcl    %ecx, %edx
>         setc    %al
>         movl    %edx, o2(%rip)
>         movzbl  %al, %eax
>         ret
>
> and now generates:
>         stc
>         adcl    %esi, %edi
>         cmc
>         movl    %edi, o1(%rip)
>         adcl    %ecx, %edx
>         setc    %al
>         movl    %edx, o2(%rip)
>         movzbl  %al, %eax
>         ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2022-06-03  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
>         Use new x86_stc or negqi_ccc_1 instructions to set the carry flag.
>         * config/i386/i386.h (TARGET_SLOW_STC): New define.
>         * config/i386/i386.md (UNSPEC_CLC): New UNSPEC for clc.
>         (UNSPEC_STC): New UNSPEC for stc.
>         (UNSPEC_CMC): New UNSPEC for cmc.
>         (*x86_clc): New define_insn.
>         (*x86_clc_xor): New define_insn for pentium4 without -Os.
>         (x86_stc): New define_insn.
>         (define_split): Convert x86_stc into alternate implementation
>         on pentium4.
>         (x86_cmc): New define_insn.
>         (*x86_cmc_1): New define_insn_and_split to recognize cmc pattern.
>         (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
>         recognize (and eliminate) the carry flag being copied to itself.
>         (*setcc_qi_negqi_ccc_2_<mode>): Likewise.
>         (neg<mode>_ccc_1): Renamed from *neg<mode>_ccc_1 for gen function.
>         * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/cmc-1.c: New test case.
>         * gcc.target/i386/stc-1.c: Likewise.

+;; Clear carry flag.
+(define_insn "*x86_clc"
+  [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC))]
+  "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)"
+  "clc"
+  [(set_attr "length" "1")
+   (set_attr "length_immediate" "0")
+   (set_attr "modrm" "0")])
+
+(define_insn "*x86_clc_xor"
+  [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_CLC))
+   (clobber (match_scratch:SI 0 "=r"))]
+  "TARGET_SLOW_STC && !optimize_function_for_size_p (cfun)"
+  "xor{l}\t%0, %0"
+    [(set_attr "type" "alu1")
+     (set_attr "mode" "SI")
+     (set_attr "length_immediate" "0")])

I think the above would be better implemented as a peephole2 pattern
that triggers when a register is available. We should not waste a
register on a register starved x86_32 just to set a carry flag. This
is implemented with:

  [(match_scratch:SI 2 "r")

at the beginning of the peephole2 pattern that generates x86_clc_xor.
The pattern should be constrained with "TARGET_SLOW_STC &&
!optimize_function_for_size_p()" and x86_clc_xor should be available
only after reload (like e.g. "*mov<mode>_xor").

+;; On Pentium 4, set the carry flag using mov $1,%al;neg %al.
+(define_split
+  [(set (reg:CCC FLAGS_REG) (unspec:CCC [(const_int 0)] UNSPEC_STC))]
+  "TARGET_SLOW_STC
+   && !optimize_insn_for_size_p ()
+   && can_create_pseudo_p ()"
+  [(set (match_dup 0) (const_int 1))
+   (parallel
+     [(set (reg:CCC FLAGS_REG)
+       (unspec:CCC [(match_dup 0) (const_int 0)] UNSPEC_CC_NE))
+      (set (match_dup 0) (neg:QI (match_dup 0)))])]
+  "operands[0] = gen_reg_rtx (QImode);")

Same here, this should be a peephole2 to trigger only when a register
is available, again for "TARGET_SLOW_STC &&
!optimize_function_for_size_p()".

Is the new sequence "mov $1,%al; neg %al" any better than existing
"mov $1,%al; addb $-1,%al" or is there another reason for the changed
sequence?

+(define_insn_and_split "*x86_cmc_1"
+  [(set (reg:CCC FLAGS_REG)
+        (unspec:CCC [(geu:QI (reg:CCC FLAGS_REG) (const_int 0))
+             (const_int 0)] UNSPEC_CC_NE))]
+  "!TARGET_SLOW_STC || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG) (unspec:CCC [(reg:CCC FLAGS_REG)] UNSPEC_CMC))])

The above should be the RTL model of x86_cmc. No need to split to an unspec.

Uros.

Reply via email to