On 7/11/24 12:45 PM, Roger Sayle wrote:
This patch improves the speed of ARC's ashrsi3 and lshrsi3, on CPUs
without a barrel shifter, when not optimizing for size. The current
implementations of right shifts by a constant are optimal for code
size, but at significant performance cost. By emitting an extra
instruction or two, when not optimizing for size, we can improve
performance (sometimes dramatically).
[al]shrsi3 #5 Before 4 insns@12 cycles, after 5 insns@5 cycles
Without -mswap
[al]shrsi3 #29 Before 4 insns@60 cycles, after 5 insns@31 cycles
With -mswap
lshrsi3 #29 Before 4 insns@60 cycles, after 6 insns@16 cycles
This patch has been minimally tested by building a cross-compiler
to arc-linux hosted on x86_64-pc-linux-gnu where there are no new
failures from "make -k check" in the compile-only tests.
Ok for mainline (after 3rd-party testing)?
2024-07-11 Roger Sayle <ro...@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_split_ashr): When not optimizing for
size; fully unroll ashr #5, on TARGET_SWAP for shifts between
19 and 29, perform ashr #16 using two instructions then
recursively perform the remaining shift, and for shifts by
odd amounts perform a single shift then the remainder
of the shift using a loop doing two bits per iteration.
(arc_split_lshr): Likewise.
Claudiu should have the last say here. But I did throw this into my
tester which didn't report any problems. But note that for arc-elf my
tester doesn't have a simulator, so all the execution tests are assumed
to pass.
jeff