On 7/15/24 7:53 AM, Vladimir Makarov wrote:
On 6/14/24 07:10, user202...@protonmail.com wrote:
This patch was inspired from PR 110137. It reduces the amount of stack
spilling by ensuring that more values are constant across a pure
function call.
It does not add any new flag; rather, it makes the optimizer generate
more optimal code.
For the added test file, the change is the following. As can be seen,
the number of memory operations is cut in half (almost, because rbx =
rdi also need to be saved in the "after" version).
Before:
_Z2ggO7MyClass:
.LFB653:
.cfi_startproc
sub rsp, 72
.cfi_def_cfa_offset 80
movdqu xmm1, XMMWORD PTR [rdi]
movdqu xmm0, XMMWORD PTR [rdi+16]
movaps XMMWORD PTR [rsp+16], xmm1
movaps XMMWORD PTR [rsp], xmm0
call _Z1fv
movdqa xmm1, XMMWORD PTR [rsp+16]
movdqa xmm0, XMMWORD PTR [rsp]
lea rdx, [rsp+32]
movaps XMMWORD PTR [rsp+32], xmm1
movaps XMMWORD PTR [rsp+48], xmm0
add rsp, 72
.cfi_def_cfa_offset 8
ret
.cfi_endproc
After:
_Z2ggO7MyClass:
.LFB653:
.cfi_startproc
push rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
mov rbx, rdi
sub rsp, 32
.cfi_def_cfa_offset 48
call _Z1fv
movdqu xmm0, XMMWORD PTR [rbx]
movaps XMMWORD PTR [rsp], xmm0
movdqu xmm0, XMMWORD PTR [rbx+16]
movaps XMMWORD PTR [rsp+16], xmm0
add rsp, 32
.cfi_def_cfa_offset 16
pop rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
As explained in PR 110137, the reason I modify the RTL pass instead of
the GIMPLE pass is that currently the code that handle the
optimization is in the IRA.
The optimization involved is: rewrite
definition: a = something;
...
use a;
to move the definition statement right before the use statement,
provided none of the statements inbetween modifies "something".
The existing code only handle the case where "something" is a memory
reference with a fixed address. The patch modifies the logic to also
allow memory reference whose address is not changed by the statements
inbetween.
In order to do that the only way I can think of is to modify
"validate_equiv_mem" to also validate the equivalence of the address,
which may consist of a pseudo-register.
Nevertheless, reviews and suggestions to improve the code/explain how
to implement it in GIMPLE phase would be appreciated.
Bootstrapped and regression tested on x86_64-pc-linux-gnu.
I think the test passes but there are some spurious failures with some
scan-assembler-* tests, looking through it it doesn't appear to
pessimize the code.
I think this also fix PR 103541 as a side effect, but I'm not sure if
the generated code is optimal (it loads from the global variable
twice, but then there's no readily usable caller-saved register so you
need an additional memory operation anyway)
Sorry for the big delay with the review, I missed your patch. It is
better to CC a patch to maintainers of the related code to avoid such
situation. Fortunately, Jeff Law pointed me out your patch recently. It
is also a good practice to ping the patch if there is no response for
one week. Sometimes people do several pings to get an attention for
their patches.
The patch looks ok to me. The only thing I found is that the test case
should be in g++.target/i386 dir, not in gcc.target/i386. I would also
add -std=c++11 (or something more), as the test in g++.target will be
run with different -std options and for -std=c++98 the test will not pass.
Also it is better to test such non-trivial patches on all major
targets. You can use compiler farm for this if you have no own
available machines. Otherwise, you should pay attention to new
testsuite regressions on other targets after submitting the patch.
So the patch can be committed with the test change I wrote above.
And thank you to find the opportunity to generate a better code and
implementing it.
Note this patch is causing the compiler to hang on the compile/pr43415.c
testcase on visium-elf. So there's no way for it to go forward without
some additional testing and bugfixing.
For the submitter. If you configure the compiler with:
<path-to-gcc-sources>/configure --target=visium-elf
Then build with
make all-gcc
Then test with
cd gcc; make check-gcc RUNTESTFLAGS=compile.exp=pr43415.c
You should see the failure (after a long wait for the dejagnu timeout to
kick in).
jeff