On Mon, Nov 1, 2021 at 5:45 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This simple patch improves the implementation of 128-bit (TImode) > rotations on x86_64 (a missed optimization opportunity spotted > during the recent V1TImode improvements). > > Currently, the function: > > unsigned __int128 rotrti3(unsigned __int128 x, unsigned int i) { > return (x >> i) | (x << (128-i)); > } > > produces: > > rotrti3: > movq %rsi, %r8 > movq %rdi, %r9 > movl %edx, %ecx > movq %rdi, %rsi > movq %r9, %rax > movq %r8, %rdx > movq %r8, %rdi > shrdq %r8, %rax > shrq %cl, %rdx > xorl %r8d, %r8d > testb $64, %cl > cmovne %rdx, %rax > cmovne %r8, %rdx > negl %ecx > andl $127, %ecx > shldq %r9, %rdi > salq %cl, %rsi > xorl %r9d, %r9d > testb $64, %cl > cmovne %rsi, %rdi > cmovne %r9, %rsi > orq %rdi, %rdx > orq %rsi, %rax > ret > > with this patch, GCC will now generate the much nicer: > rotrti3: > movl %edx, %ecx > movq %rdi, %rdx > shrdq %rsi, %rdx > shrdq %rdi, %rsi > andl $64, %ecx > movq %rdx, %rax > cmove %rsi, %rdx > cmovne %rsi, %rax > ret > > Even I wasn't expecting the optimizer's choice of the final three > instructions; a thing of beauty. For rotations larger than 64, > the lowpart and the highpart (%rax and %rdx) are transposed, and > it would be nice to have a conditional swap/exchange. The inspired > solution the compiler comes up with is to store/duplicate the same > value in both %rax/%rdx, and then use complementary conditional moves > to either update the lowpart or highpart, which cleverly avoids the > potential decode-stage pipeline stall (on some microarchitectures) > from having multiple instructions conditional on the same condition. > See X86_TUNE_ONE_IF_CONV_INSN, and notice there are two such stalls > in the original expansion of rot[rl]ti3. > > One quick question, does TARGET_64BIT (always) imply TARGET_CMOVE?
Yes, from i386-options.c: /* X86_ARCH_CMOV: Conditional move was added for pentiumpro. */ ~(m_386 | m_486 | m_PENT | m_LAKEMONT | m_K6), > This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap > and make -k check with no new failures. Interestingly the correct > behaviour is already tested by (amongst other tests) sse2-v1ti-shift-3.c > that confirms V1TImode rotates by constants match rotlti3/rotrti3. > > Ok for mainline? > > > 2021-11-01 Roger Sayle <ro...@nextmovesoftware.com> > > * config/i386/i386.md (<any_rotate>ti3): Provide expansion for > rotations by non-constant amounts on TARGET_CMOVE architectures. OK with a nit below. Thanks, Uros. + if (<CODE> == ROTATE) + { + emit_insn (gen_x86_64_shld (tmp_lo, src_hi, amount)); + emit_insn (gen_x86_64_shld (tmp_hi, src_lo, amount)); + } + else + { + emit_insn (gen_x86_64_shrd (tmp_lo, src_hi, amount)); + emit_insn (gen_x86_64_shrd (tmp_hi, src_lo, amount)); + } rtx (*shift) (rtx, rtx, rtx) = (<CODE> == ROTATE) ? gen_x86_64_shld : gen_x86_64_shrd; emit_insn (shift (tmp_lo, src_hi, amount)); emit_insn (shift (tmp_hi, src_lo, amount)); > > Thanks in advance, > Roger > -- >