On 6/14/24 07:10, user202...@protonmail.com wrote:
This patch was inspired from PR 110137. It reduces the amount of stack spilling
by ensuring that more values are constant across a pure function call.
It does not add any new flag; rather, it makes the optimizer generate more
optimal code.
For the added test file, the change is the following. As can be seen, the number of
memory operations is cut in half (almost, because rbx = rdi also need to be saved in the
"after" version).
Before:
_Z2ggO7MyClass:
.LFB653:
.cfi_startproc
sub rsp, 72
.cfi_def_cfa_offset 80
movdqu xmm1, XMMWORD PTR [rdi]
movdqu xmm0, XMMWORD PTR [rdi+16]
movaps XMMWORD PTR [rsp+16], xmm1
movaps XMMWORD PTR [rsp], xmm0
call _Z1fv
movdqa xmm1, XMMWORD PTR [rsp+16]
movdqa xmm0, XMMWORD PTR [rsp]
lea rdx, [rsp+32]
movaps XMMWORD PTR [rsp+32], xmm1
movaps XMMWORD PTR [rsp+48], xmm0
add rsp, 72
.cfi_def_cfa_offset 8
ret
.cfi_endproc
After:
_Z2ggO7MyClass:
.LFB653:
.cfi_startproc
push rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
mov rbx, rdi
sub rsp, 32
.cfi_def_cfa_offset 48
call _Z1fv
movdqu xmm0, XMMWORD PTR [rbx]
movaps XMMWORD PTR [rsp], xmm0
movdqu xmm0, XMMWORD PTR [rbx+16]
movaps XMMWORD PTR [rsp+16], xmm0
add rsp, 32
.cfi_def_cfa_offset 16
pop rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
As explained in PR 110137, the reason I modify the RTL pass instead of the
GIMPLE pass is that currently the code that handle the optimization is in the
IRA.
The optimization involved is: rewrite
definition: a = something;
...
use a;
to move the definition statement right before the use statement, provided none of the
statements inbetween modifies "something".
The existing code only handle the case where "something" is a memory reference
with a fixed address. The patch modifies the logic to also allow memory reference whose
address is not changed by the statements inbetween.
In order to do that the only way I can think of is to modify
"validate_equiv_mem" to also validate the equivalence of the address, which may
consist of a pseudo-register.
Nevertheless, reviews and suggestions to improve the code/explain how to
implement it in GIMPLE phase would be appreciated.
Bootstrapped and regression tested on x86_64-pc-linux-gnu.
I think the test passes but there are some spurious failures with some
scan-assembler-* tests, looking through it it doesn't appear to pessimize the
code.
I think this also fix PR 103541 as a side effect, but I'm not sure if the
generated code is optimal (it loads from the global variable twice, but then
there's no readily usable caller-saved register so you need an additional
memory operation anyway)
Sorry for the big delay with the review, I missed your patch. It is
better to CC a patch to maintainers of the related code to avoid such
situation. Fortunately, Jeff Law pointed me out your patch recently. It
is also a good practice to ping the patch if there is no response for
one week. Sometimes people do several pings to get an attention for
their patches.
The patch looks ok to me. The only thing I found is that the test case
should be in g++.target/i386 dir, not in gcc.target/i386. I would also
add -std=c++11 (or something more), as the test in g++.target will be
run with different -std options and for -std=c++98 the test will not pass.
Also it is better to test such non-trivial patches on all major
targets. You can use compiler farm for this if you have no own
available machines. Otherwise, you should pay attention to new
testsuite regressions on other targets after submitting the patch.
So the patch can be committed with the test change I wrote above.
And thank you to find the opportunity to generate a better code and
implementing it.