On 7/11/24 12:45 PM, Roger Sayle wrote:

This patch improves the speed of ARC's ashrsi3 and lshrsi3, on CPUs
without a barrel shifter, when not optimizing for size.  The current
implementations of right shifts by a constant are optimal for code
size, but at significant performance cost.  By emitting an extra
instruction or two, when not optimizing for size, we can improve
performance (sometimes dramatically).

[al]shrsi3 #5   Before 4 insns@12 cycles, after 5 insns@5 cycles

Without -mswap
[al]shrsi3 #29  Before 4 insns@60 cycles, after 5 insns@31 cycles

With -mswap
lshrsi3 #29     Before 4 insns@60 cycles, after 6 insns@16 cycles


This patch has been minimally tested by building a cross-compiler
to arc-linux hosted on x86_64-pc-linux-gnu where there are no new
failures from "make -k check" in the compile-only tests.
Ok for mainline (after 3rd-party testing)?


2024-07-11  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         * config/arc/arc.cc (arc_split_ashr): When not optimizing for
         size; fully unroll ashr #5, on TARGET_SWAP for shifts between
         19 and 29, perform ashr #16 using two instructions then
         recursively perform the remaining shift, and for shifts by
         odd amounts perform a single shift then the remainder
         of the shift using a loop doing two bits per iteration.
         (arc_split_lshr): Likewise.
Claudiu should have the last say here. But I did throw this into my tester which didn't report any problems. But note that for arc-elf my tester doesn't have a simulator, so all the execution tests are assumed to pass.

jeff

Reply via email to