On 6/14/24 07:10, user202...@protonmail.com wrote:
This patch was inspired from PR 110137. It reduces the amount of stack spilling 
by ensuring that more values are constant across a pure function call.

It does not add any new flag; rather, it makes the optimizer generate more 
optimal code.

For the added test file, the change is the following. As can be seen, the number of 
memory operations is cut in half (almost, because rbx = rdi also need to be saved in the 
"after" version).

Before:

        _Z2ggO7MyClass:
        .LFB653:
                .cfi_startproc
                sub     rsp, 72
                .cfi_def_cfa_offset 80
                movdqu  xmm1, XMMWORD PTR [rdi]
                movdqu  xmm0, XMMWORD PTR [rdi+16]
                movaps  XMMWORD PTR [rsp+16], xmm1
                movaps  XMMWORD PTR [rsp], xmm0
                call    _Z1fv
                movdqa  xmm1, XMMWORD PTR [rsp+16]
                movdqa  xmm0, XMMWORD PTR [rsp]
                lea     rdx, [rsp+32]
                movaps  XMMWORD PTR [rsp+32], xmm1
                movaps  XMMWORD PTR [rsp+48], xmm0
                add     rsp, 72
                .cfi_def_cfa_offset 8
                ret
                .cfi_endproc

After:

        _Z2ggO7MyClass:
        .LFB653:
                .cfi_startproc
                push    rbx
                .cfi_def_cfa_offset 16
                .cfi_offset 3, -16
                mov     rbx, rdi
                sub     rsp, 32
                .cfi_def_cfa_offset 48
                call    _Z1fv
                movdqu  xmm0, XMMWORD PTR [rbx]
                movaps  XMMWORD PTR [rsp], xmm0
                movdqu  xmm0, XMMWORD PTR [rbx+16]
                movaps  XMMWORD PTR [rsp+16], xmm0
                add     rsp, 32
                .cfi_def_cfa_offset 16
                pop     rbx
                .cfi_def_cfa_offset 8
                ret
                .cfi_endproc

As explained in PR 110137, the reason I modify the RTL pass instead of the 
GIMPLE pass is that currently the code that handle the optimization is in the 
IRA.

The optimization involved is: rewrite

definition: a = something;
...
use a;

to move the definition statement right before the use statement, provided none of the 
statements inbetween modifies "something".

The existing code only handle the case where "something" is a memory reference 
with a fixed address. The patch modifies the logic to also allow memory reference whose 
address is not changed by the statements inbetween.

In order to do that the only way I can think of is to modify 
"validate_equiv_mem" to also validate the equivalence of the address, which may 
consist of a pseudo-register.

Nevertheless, reviews and suggestions to improve the code/explain how to 
implement it in GIMPLE phase would be appreciated.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

I think the test passes but there are some spurious failures with some 
scan-assembler-* tests, looking through it it doesn't appear to pessimize the 
code.

I think this also fix PR 103541 as a side effect, but I'm not sure if the 
generated code is optimal (it loads from the global variable twice, but then 
there's no readily usable caller-saved register so you need an additional 
memory operation anyway)

Sorry for the big delay with the review, I missed your patch.  It is better to CC a patch to maintainers of the related code to avoid such situation.  Fortunately, Jeff Law pointed me out your patch recently. It is also a good practice to ping the patch if there is no response for one week.  Sometimes people do several pings to get an attention for their patches.

The patch looks ok to me.  The only thing I found is that the test case should be in g++.target/i386 dir, not in gcc.target/i386.  I would also add -std=c++11 (or something more), as the test in g++.target will be run with different -std options and for -std=c++98 the test will not pass.

Also it is better to test such non-trivial patches on all major targets.  You can use compiler farm for this if you have no own available machines.  Otherwise, you should pay attention to new testsuite regressions on other targets after submitting the patch.

So the patch can be committed with the test change I wrote above.

And thank you to find the opportunity to generate a better code and implementing it.



Reply via email to