Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

Vladimir Makarov Mon, 15 Jul 2024 06:54:05 -0700


On 6/14/24 07:10, user202...@protonmail.com wrote:

This patch was inspired from PR 110137. It reduces the amount of stack spilling 
by ensuring that more values are constant across a pure function call.

It does not add any new flag; rather, it makes the optimizer generate more 
optimal code.

For the added test file, the change is the following. As can be seen, the number of 
memory operations is cut in half (almost, because rbx = rdi also need to be saved in the 
"after" version).

Before:

        _Z2ggO7MyClass:
        .LFB653:
                .cfi_startproc
                sub     rsp, 72
                .cfi_def_cfa_offset 80
                movdqu  xmm1, XMMWORD PTR [rdi]
                movdqu  xmm0, XMMWORD PTR [rdi+16]
                movaps  XMMWORD PTR [rsp+16], xmm1
                movaps  XMMWORD PTR [rsp], xmm0
                call    _Z1fv
                movdqa  xmm1, XMMWORD PTR [rsp+16]
                movdqa  xmm0, XMMWORD PTR [rsp]
                lea     rdx, [rsp+32]
                movaps  XMMWORD PTR [rsp+32], xmm1
                movaps  XMMWORD PTR [rsp+48], xmm0
                add     rsp, 72
                .cfi_def_cfa_offset 8
                ret
                .cfi_endproc

After:

        _Z2ggO7MyClass:
        .LFB653:
                .cfi_startproc
                push    rbx
                .cfi_def_cfa_offset 16
                .cfi_offset 3, -16
                mov     rbx, rdi
                sub     rsp, 32
                .cfi_def_cfa_offset 48
                call    _Z1fv
                movdqu  xmm0, XMMWORD PTR [rbx]
                movaps  XMMWORD PTR [rsp], xmm0
                movdqu  xmm0, XMMWORD PTR [rbx+16]
                movaps  XMMWORD PTR [rsp+16], xmm0
                add     rsp, 32
                .cfi_def_cfa_offset 16
                pop     rbx
                .cfi_def_cfa_offset 8
                ret
                .cfi_endproc

As explained in PR 110137, the reason I modify the RTL pass instead of the 
GIMPLE pass is that currently the code that handle the optimization is in the 
IRA.

The optimization involved is: rewrite

definition: a = something;
...
use a;

to move the definition statement right before the use statement, provided none of the 
statements inbetween modifies "something".

The existing code only handle the case where "something" is a memory reference 
with a fixed address. The patch modifies the logic to also allow memory reference whose 
address is not changed by the statements inbetween.

In order to do that the only way I can think of is to modify 
"validate_equiv_mem" to also validate the equivalence of the address, which may 
consist of a pseudo-register.

Nevertheless, reviews and suggestions to improve the code/explain how to 
implement it in GIMPLE phase would be appreciated.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

I think the test passes but there are some spurious failures with some 
scan-assembler-* tests, looking through it it doesn't appear to pessimize the 
code.

I think this also fix PR 103541 as a side effect, but I'm not sure if the 
generated code is optimal (it loads from the global variable twice, but then 
there's no readily usable caller-saved register so you need an additional 
memory operation anyway)

Sorry for the big delay with the review, I missed your patch. It isbetter to CC a patch to maintainers of the related code to avoid suchsituation. Fortunately, Jeff Law pointed me out your patch recently. Itis also a good practice to ping the patch if there is no response forone week. Sometimes people do several pings to get an attention fortheir patches.

The patch looks ok to me. The only thing I found is that the test caseshould be in g++.target/i386 dir, not in gcc.target/i386. I would alsoadd -std=c++11 (or something more), as the test in g++.target will berun with different -std options and for -std=c++98 the test will not pass.

Also it is better to test such non-trivial patches on all majortargets. You can use compiler farm for this if you have no ownavailable machines. Otherwise, you should pay attention to newtestsuite regressions on other targets after submitting the patch.


So the patch can be committed with the test change I wrote above.

And thank you to find the opportunity to generate a better code andimplementing it.

Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

Reply via email to