On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:
Accessing local arrays element turned into load form (fp + (index <<
C1)) + C2 address. In the case when access is in the loop we got loop
invariant computation. For some reason, moving out that part cannot
be done in loop-invariant passes. But we can handle that in
target-specific hook (legitimize_address). That provides an
opportunity to rewrite memory access more suitable for the target
architecture.

This patch solves the mentioned case by rewriting mentioned case to
((fp + C2) + (index << C1)) I have evaluated it on SPEC2017 and got
an improvement on leela (over 7b instructions, .39% of the dynamic
count) and dwarfs the regression for gcc (14m instructions, .0012% of
the dynamic count).


gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_address):
Handle folding. (mem_shadd_or_shadd_rtx_p): New predicate.
So I still need to give the new version a review. But a high level question -- did you re-run the benchmarks with this version to verify that we still saw the same nice improvement in leela?

The reason I ask is when I use this on Ventana's internal tree I don't see any notable differences in the dynamic instruction counts. And probably the most critical difference between the upstream tree and Ventana's tree in this space is Ventana's internal tree has an earlier version of the fold-mem-offsets work from Manolis.

It may ultimately be the case that this work and Manolis's f-m-o patch have a lot of overlap in terms of their final effect on code generation. Manolis's pass runs much later (after register allocation), so it's not going to address the loop-invariant-code-motion issue that originally got us looking into this space. But his pass is generic enough that it helps other targets. So we may ultimately want both.

Anyway, just wanted to verify if this variant is still showing the nice improvement on leela that the prior version did.

Jeff

ps.  I know you're on PTO.  No rush on responding -- enjoy the time off.

Reply via email to