On Sun, Nov 12, 2023 at 10:03 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch improves register pressure during reload, inspired by PR 97756.
> Normally, a double-word right-shift by a constant produces a double-word
> result, the highpart of which is dead when followed by a truncation.
> The dead code calculating the high part gets cleaned up post-reload, so
> the issue isn't normally visible, except for the increased register
> pressure during reload, sometimes leading to odd register assignments.
> Providing a post-reload splitter, which clobbers a single wordmode
> result register instead of a doubleword result register, helps (a bit).
>
> An example demonstrating this effect is:
>
> #define MASK60 ((1ul << 60) - 1)
> unsigned long foo (__uint128_t n)
> {
>   unsigned long a = n & MASK60;
>   unsigned long b = (n >> 60);
>   b = b & MASK60;
>   unsigned long c = (n >> 120);
>   return a+b+c;
> }
>
> which currently with -O2 generates (13 instructions):
> foo:    movabsq $1152921504606846975, %rcx
>         xchgq   %rdi, %rsi
>         movq    %rsi, %rax
>         shrdq   $60, %rdi, %rax
>         movq    %rax, %rdx
>         movq    %rsi, %rax
>         movq    %rdi, %rsi
>         andq    %rcx, %rax
>         shrq    $56, %rsi
>         andq    %rcx, %rdx
>         addq    %rsi, %rax
>         addq    %rdx, %rax
>         ret
>
> with this patch, we generate one less mov (12 instructions):
> foo:    movabsq $1152921504606846975, %rcx
>         xchgq   %rdi, %rsi
>         movq    %rdi, %rdx
>         movq    %rsi, %rax
>         movq    %rdi, %rsi
>         shrdq   $60, %rdi, %rdx
>         andq    %rcx, %rax
>         shrq    $56, %rsi
>         addq    %rsi, %rax
>         andq    %rcx, %rdx
>         addq    %rdx, %rax
>         ret
>
> The significant difference is easier to see via diff:
> <       shrdq   $60, %rdi, %rax
> <       movq    %rax, %rdx
> ---
> >       shrdq   $60, %rdi, %rdx
>
>
> Admittedly a single "mov" isn't much of a saving on modern architectures,
> but as demonstrated by the PR, people still track the number of them.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-11-12  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): New
>         define_insn_and_split to optimize register usage of doubleword
>         right shifts followed by truncation.


+;; Split truncations of TImode right shifts into x86_64_shrd_1.
+;; Split truncations of DImode right shifts into x86_shrd_1.

You can just say

;; Split truncations of double word right shifts into x86_shrd_1.

OK with the above change.

Thanks,
Uros.

>
> Thanks in advance,
> Roger
> --
>

Reply via email to