On Fri, Jul 29, 2022 at 8:10 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch adds rot[lr]64ti2_doubleword patterns to the x86_64 backend,
> to move splitting of 128-bit TImode rotates by 64 bits after reload,
> matching what we now do for 64-bit DImode rotations by 32 bits with -m32.
>
> In theory moving when this rotation is split should have little
> influence on code generation, but in practice "reload" sometimes
> decides to make use of the increased flexibility to reduce the number
> of registers used, and the code size, by using xchg.
>
> For example:
> __int128 x;
> __int128 y;
> __int128 a;
> __int128 b;
>
> void foo()
> {
>     unsigned __int128 t = x;
>     t ^= a;
>     t = (t<<64) | (t>>64);
>     t ^= b;
>     y = t;
> }
>
> Before:
>         movq    x(%rip), %rsi
>         movq    x+8(%rip), %rdi
>         xorq    a(%rip), %rsi
>         xorq    a+8(%rip), %rdi
>         movq    %rdi, %rax
>         movq    %rsi, %rdx
>         xorq    b(%rip), %rax
>         xorq    b+8(%rip), %rdx
>         movq    %rax, y(%rip)
>         movq    %rdx, y+8(%rip)
>         ret
>
> After:
>         movq    x(%rip), %rax
>         movq    x+8(%rip), %rdx
>         xorq    a(%rip), %rax
>         xorq    a+8(%rip), %rdx
>         xchgq   %rdx, %rax
>         xorq    b(%rip), %rax
>         xorq    b+8(%rip), %rdx
>         movq    %rax, y(%rip)
>         movq    %rdx, y+8(%rip)
>         ret
>
> One some modern architectures this is a small win, on some older
> architectures this is a small loss.  The decision which code to
> generate is made in "reload", and could probably be tweaked by
> register preferencing.  The much bigger win is that (eventually) all
> TImode mode shifts and rotates by constants will become potential
> candidates for TImode STV.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
>
> 2022-07-29  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (define_expand <any_rotate>ti3): For
>         rotations by 64 bits use new rot[lr]64ti2_doubleword pattern.
>         (rot[lr]64ti2_doubleword): New post-reload splitter.

OK.

Thanks,
Uros.

Reply via email to