On Mon, Jul 1, 2024 at 3:20 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch adds an additional variation of the peephole2 used to convert > bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into > rotw if the flags register isn't live. The motivating example is: > > void ext(int x); > void foo(int x) > { > ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8)); > } > > where GCC with -O2 currently produces: > > foo: movl %edi, %eax > rolw $8, %ax > movl %eax, %edi > jmp ext > > The issue is that the original xchgb (bswaphisi2_lowpart) can only be > performed in "Q" registers that allow the %?h register to be used, so > reload generates the above two movl. However, it's later in peephole2 > where we see that CC_FLAGS can be clobbered, so we can use a rotate word, > which is more forgiving with register allocations. With the additional > peephole2 proposed here, we now generate: > > foo: rolw $8, %di > jmp ext > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-07-01 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (bswaphisi2_lowpart peephole2): New > peephole2 variant to eliminate register shuffling. > > gcc/testsuite/ChangeLog > * gcc.target/i386/xchg-4.c: New test case.
OK. Thanks, Uros. > > > Thanks again, > Roger > -- >