On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch avoids long lea instructions for performing x<<2 and x<<3
> by splitting them into shorter sal and move (or xchg instructions).
> Because this increases the number of instructions, but reduces the
> total size, its suitable for -Oz (but not -Os).
>
> The impact can be seen in the new test case:
>
> int foo(int x) { return x<<2; }
> int bar(int x) { return x<<3; }
> long long fool(long long x) { return x<<2; }
> long long barl(long long x) { return x<<3; }
>
> where with -O2 we generate:
>
> foo:    lea    0x0(,%rdi,4),%eax        // 7 bytes
>         retq
> bar:    lea    0x0(,%rdi,8),%eax        // 7 bytes
>         retq
> fool:   lea    0x0(,%rdi,4),%rax        // 8 bytes
>         retq
> barl:   lea    0x0(,%rdi,8),%rax        // 8 bytes
>         retq
>
> and with -Oz we now generate:
>
> foo:    xchg   %eax,%edi                // 1 byte
>         shl    $0x2,%eax                // 3 bytes
>         retq
> bar:    xchg   %eax,%edi                // 1 byte
>         shl    $0x3,%eax                // 3 bytes
>         retq
> fool:   xchg   %rax,%rdi                // 2 bytes
>         shl    $0x2,%rax                // 4 bytes
>         retq
> barl:   xchg   %rax,%rdi                // 2 bytes
>         shl    $0x3,%rax                // 4 bytes
>         retq
>
> Over the entirety of the CSiBE code size benchmark this saves 1347
> bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32.
> Conveniently, there's already a backend function in i386.cc for
> deciding whether to split an lea into its component instructions,
> ix86_avoid_lea_for_addr, all that's required is an additional clause
> checking for -Oz (i.e. optimize_size > 1).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board='unix{-m32}'
> with no new failures.  Additional testing was performed by repeating
> these steps after removing the "optimize_size > 1" condition, so that
> suitable lea instructions were always split [-Oz is not heavily
> tested, so this invoked the new code during the bootstrap and
> regression testing], again with no regressions.  Ok for mainline?
>
>
> 2023-10-05  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs used
>         to perform left shifts into shorter instructions with -Oz.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/lea-2.c: New test case.
>

OK, but ...

@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */

Is there a reason to avoid 32-bit targets? I'd expect that the
optimization also triggers on x86_32 for 32bit integers.

+/* { dg-options "-Oz" } */
+int foo(int x) { return x<<2; }
+int bar(int x) { return x<<3; }
+long long fool(long long x) { return x<<2; }
+long long barl(long long x) { return x<<3; }
+/* { dg-final { scan-assembler-not "lea\[lq\]" } } */

Uros.

Reply via email to