On Thu, Aug 3, 2023 at 12:18 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch is a conservative fix for PR target/110792, a wrong-code > regression affecting doubleword rotations by BITS_PER_WORD, which > effectively swaps the highpart and lowpart words, when the source to be > rotated resides in memory. The issue is that if the register used to > hold the lowpart of the destination is mentioned in the address of > the memory operand, the current define_insn_and_split unintentionally > clobbers it before reading the highpart. > > Hence, for the testcase, the incorrectly generated code looks like: > > salq $4, %rdi // calculate address > movq WHIRL_S+8(%rdi), %rdi // accidentally clobber addr > movq WHIRL_S(%rdi), %rbp // load (wrong) lowpart > > Traditionally, the textbook way to fix this would be to add an > explicit early clobber to the instruction's constraints. > > (define_insn_and_split "<insn>32di2_doubleword" > - [(set (match_operand:DI 0 "register_operand" "=r,r,r") > + [(set (match_operand:DI 0 "register_operand" "=r,r,&r") > (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o") > (const_int 32)))] > > but unfortunately this currently generates significantly worse code, > due to a strange choice of reloads (effectively memcpy), which ends up > looking like: > > salq $4, %rdi // calculate address > movdqa WHIRL_S(%rdi), %xmm0 // load the double word in SSE reg. > movaps %xmm0, -16(%rsp) // store the SSE reg back to the > stack > movq -8(%rsp), %rdi // load highpart > movq -16(%rsp), %rbp // load lowpart > > Note that reload's "&" doesn't distinguish between the memory being > early clobbered, vs the registers used in an addressing mode being > early clobbered. > > The fix proposed in this patch is to remove the third alternative, that > allowed offsetable memory as an operand, forcing reload to place the > operand into a register before the rotation. This results in: > > salq $4, %rdi > movq WHIRL_S(%rdi), %rax > movq WHIRL_S+8(%rdi), %rdi > movq %rax, %rbp > > I believe there's a more advanced solution, by swapping the order of > the loads (if first destination register is mentioned in the address), > or inserting a lea insn (if both destination registers are mentioned > in the address), but this fix is a minimal "safe" solution, that > should hopefully be suitable for backporting. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-08-02 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/110792 > * config/i386/i386.md (<any_rotate>ti3): For rotations by 64 bits > place operand in a register before gen_<insn>64ti2_doubleword. > (<any_rotate>di3): Likewise, for rotations by 32 bits, place > operand in a register before gen_<insn>32di2_doubleword. > (<any_rotate>32di2_doubleword): Constrain operand to be in register. > (<any_rotate>64ti2_doubleword): Likewise. > > gcc/testsuite/ChangeLog > PR target/110792 > * g++.target/i386/pr110792.C: New 32-bit C++ test case. > * gcc.target/i386/pr110792.c: New 64-bit C test case.
OK. Thanks, Uros. > > > Thanks in advance, > Roger > -- >