On Fri, Jul 8, 2022 at 9:15 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch adds support for x86's single-byte encoded stc (set carry flag)
> and clc (clear carry flag) instructions to i386.md.
>
> The motivating example is the simple code snippet:
>
> unsigned int foo (unsigned int a, unsigned int b, unsigned int *c)
> {
>   return __builtin_ia32_addcarryx_u32 (1, a, b, c);
> }
>
> which uses the target built-in to generate an adc instruction, adding
> together A and B with the incoming carry flag already set.  Currently
> for this mainline GCC generates (with -O2):
>
>         movl    $1, %eax
>         addb    $-1, %al
>         adcl    %esi, %edi
>         setc    %al
>         movl    %edi, (%rdx)
>         movzbl  %al, %eax
>         ret
>
> where the first two instructions (to load 1 into a byte register and
> then add 255 to it) are the idiom used to set the carry flag.  This
> is a little inefficient as x86 has a "stc" instruction for precisely
> this purpose.  With the attached patch we now generate:
>
>         stc
>         adcl    %esi, %edi
>         setc    %al
>         movl    %edi, (%rdx)
>         movzbl  %al, %eax
>         ret

Please note that STC/CLC is quite unoptimal on some older
architectures. For example, Pentium4 has a latency of 10 due to false
dependency of flags [1].

[1] https://agner.org/optimize/instruction_tables.pdf


Uros.

Reply via email to