On Fri, Jul 29, 2022 at 8:10 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch adds rot[lr]64ti2_doubleword patterns to the x86_64 backend, > to move splitting of 128-bit TImode rotates by 64 bits after reload, > matching what we now do for 64-bit DImode rotations by 32 bits with -m32. > > In theory moving when this rotation is split should have little > influence on code generation, but in practice "reload" sometimes > decides to make use of the increased flexibility to reduce the number > of registers used, and the code size, by using xchg. > > For example: > __int128 x; > __int128 y; > __int128 a; > __int128 b; > > void foo() > { > unsigned __int128 t = x; > t ^= a; > t = (t<<64) | (t>>64); > t ^= b; > y = t; > } > > Before: > movq x(%rip), %rsi > movq x+8(%rip), %rdi > xorq a(%rip), %rsi > xorq a+8(%rip), %rdi > movq %rdi, %rax > movq %rsi, %rdx > xorq b(%rip), %rax > xorq b+8(%rip), %rdx > movq %rax, y(%rip) > movq %rdx, y+8(%rip) > ret > > After: > movq x(%rip), %rax > movq x+8(%rip), %rdx > xorq a(%rip), %rax > xorq a+8(%rip), %rdx > xchgq %rdx, %rax > xorq b(%rip), %rax > xorq b+8(%rip), %rdx > movq %rax, y(%rip) > movq %rdx, y+8(%rip) > ret > > One some modern architectures this is a small win, on some older > architectures this is a small loss. The decision which code to > generate is made in "reload", and could probably be tweaked by > register preferencing. The much bigger win is that (eventually) all > TImode mode shifts and rotates by constants will become potential > candidates for TImode STV. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2022-07-29 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (define_expand <any_rotate>ti3): For > rotations by 64 bits use new rot[lr]64ti2_doubleword pattern. > (rot[lr]64ti2_doubleword): New post-reload splitter.
OK. Thanks, Uros.