On Fri, Jul 8, 2022 at 9:15 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch adds support for x86's single-byte encoded stc (set carry flag) > and clc (clear carry flag) instructions to i386.md. > > The motivating example is the simple code snippet: > > unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) > { > return __builtin_ia32_addcarryx_u32 (1, a, b, c); > } > > which uses the target built-in to generate an adc instruction, adding > together A and B with the incoming carry flag already set. Currently > for this mainline GCC generates (with -O2): > > movl $1, %eax > addb $-1, %al > adcl %esi, %edi > setc %al > movl %edi, (%rdx) > movzbl %al, %eax > ret > > where the first two instructions (to load 1 into a byte register and > then add 255 to it) are the idiom used to set the carry flag. This > is a little inefficient as x86 has a "stc" instruction for precisely > this purpose. With the attached patch we now generate: > > stc > adcl %esi, %edi > setc %al > movl %edi, (%rdx) > movzbl %al, %eax > ret
Please note that STC/CLC is quite unoptimal on some older architectures. For example, Pentium4 has a latency of 10 due to false dependency of flags [1]. [1] https://agner.org/optimize/instruction_tables.pdf Uros.