On Tue, Jun 15, 2021 at 5:17 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch tackles PR46235 to improve the code generated for bit tests
> on x86_64 by making more use of the bt instruction.  Currently, GCC emits
> bt instructions when followed by condition jumps (thanks to Uros'
> splitters).
> This patch adds splitters in i386.md, to catch the cases where bt is
> followed
> by a conditional move (as in the original report), or by a setc/setnc (as in
> comment 5 of the Bugzilla PR).
>
> With this patch, the motivating function in the original PR
>
> int foo(int a, int x, int y) {
>     if (a & (1 << x))
>        return a;
>    return 1;
> }
>
> which with -O2 on mainline generates:
>
> foo:    movl    %edi, %eax
>         movl    %esi, %ecx
>         sarl    %cl, %eax
>         testb   $1, %al
>         movl    $1, %eax
>         cmovne  %edi, %eax
>         ret
>
> now generates:
> foo:    btl     %esi, %edi
>         movl    $1, %eax
>         cmovc   %edi, %eax
>         ret
>
> Likewise, IsBitSet1 (from comment 5)
>
> bool IsBitSet1(unsigned char byte, int index) {
>     return (byte & (1<<index)) != 0;
> }
>
> Before:
>         movzbl  %dil, %eax
>         movl    %esi, %ecx
>         sarl    %cl, %eax
>         andl    $1, %eax
>         ret
>
> After:
>         movzbl  %dil, %edi
>         btl     %esi, %edi
>         setc    %al
>         ret
>
> [Identical code is generated for comment 5's IsBitSet2]
> bool IsBitSet2(unsigned char byte, int index) {
>     return (byte >> index) & 1;
> }
>
> And finally to demonstrate the corner cases also handled,
>
> int IsBitClr(long long dword, int index) {
>     return (dword & (1LL<<index)) == 0;
> }
>
> Before:
>         movq    %rdi, %rax
>         movl    %esi, %ecx
>         sarq    %cl, %rax
>         notq    %rax
>         andl    $1, %eax
>         ret
>
> After:
>         xorl    %eax, %eax
>         btq     %rsi, %rdi
>         setnc   %al
>         ret
>
> According to Agner Fog, SAR/SHR r,cl takes 2 cycles on skylake,
> where BT r,r takes only one, so the performance improvements on
> recent hardware may be more significant than implied by just the
> reduced number of instructions.  I've avoided transforming cases
> (such as btsi_setcsi) where using bt sequences may not be a clear
> win (over sarq/andl).
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
> 2010-06-15  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR rtl-optimization/46235
>         * config/i386/i386.md: New define_split for bt followed by cmov.
>         (*bt<mode>_setcqi): New define_insn_and_split for bt followed by
> setc.
>         (*bt<mode>_setncqi): New define_insn_and_split for bt then setnc.
>         (*bt<mode>_setnc<mode>): New define_insn_and_split for bt followed
>         by setnc with zero extension.
>
> gcc/testsuite/ChangeLog
>         PR rtl-optimization/46235
>         * gcc.target/i386/bt-5.c: New test.
>         * gcc.target/i386/bt-6.c: New test.
>         * gcc.target/i386/bt-7.c: New test.

OK.

Thanks,
Uros.

Reply via email to