On Mon, Jul 1, 2024 at 3:20 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch adds an additional variation of the peephole2 used to convert
> bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
> rotw if the flags register isn't live.  The motivating example is:
>
> void ext(int x);
> void foo(int x)
> {
>   ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
> }
>
> where GCC with -O2 currently produces:
>
> foo:    movl    %edi, %eax
>         rolw    $8, %ax
>         movl    %eax, %edi
>         jmp     ext
>
> The issue is that the original xchgb (bswaphisi2_lowpart) can only be
> performed in "Q" registers that allow the %?h register to be used, so
> reload generates the above two movl.  However, it's later in peephole2
> where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
> which is more forgiving with register allocations.  With the additional
> peephole2 proposed here, we now generate:
>
> foo:    rolw    $8, %di
>         jmp     ext
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-07-01  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (bswaphisi2_lowpart peephole2): New
>         peephole2 variant to eliminate register shuffling.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/xchg-4.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks again,
> Roger
> --
>

Reply via email to