Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

Jeff Law Wed, 17 Jul 2024 09:20:19 -0700



On 7/15/24 7:53 AM, Vladimir Makarov wrote:

On 6/14/24 07:10, user202...@protonmail.com wrote:
This patch was inspired from PR 110137. It reduces the amount of stackspilling by ensuring that more values are constant across a purefunction call.
It does not add any new flag; rather, it makes the optimizer generatemore optimal code.
For the added test file, the change is the following. As can be seen,the number of memory operations is cut in half (almost, because rbx =rdi also need to be saved in the "after" version).
Before:

    _Z2ggO7MyClass:
    .LFB653:
        .cfi_startproc
        sub    rsp, 72
        .cfi_def_cfa_offset 80
        movdqu    xmm1, XMMWORD PTR [rdi]
        movdqu    xmm0, XMMWORD PTR [rdi+16]
        movaps    XMMWORD PTR [rsp+16], xmm1
        movaps    XMMWORD PTR [rsp], xmm0
        call    _Z1fv
        movdqa    xmm1, XMMWORD PTR [rsp+16]
        movdqa    xmm0, XMMWORD PTR [rsp]
        lea    rdx, [rsp+32]
        movaps    XMMWORD PTR [rsp+32], xmm1
        movaps    XMMWORD PTR [rsp+48], xmm0
        add    rsp, 72
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

After:

    _Z2ggO7MyClass:
    .LFB653:
        .cfi_startproc
        push    rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        mov    rbx, rdi
        sub    rsp, 32
        .cfi_def_cfa_offset 48
        call    _Z1fv
        movdqu    xmm0, XMMWORD PTR [rbx]
        movaps    XMMWORD PTR [rsp], xmm0
        movdqu    xmm0, XMMWORD PTR [rbx+16]
        movaps    XMMWORD PTR [rsp+16], xmm0
        add    rsp, 32
        .cfi_def_cfa_offset 16
        pop    rbx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
As explained in PR 110137, the reason I modify the RTL pass instead ofthe GIMPLE pass is that currently the code that handle theoptimization is in the IRA.
The optimization involved is: rewrite

definition: a = something;
...
use a;
to move the definition statement right before the use statement,provided none of the statements inbetween modifies "something".
The existing code only handle the case where "something" is a memoryreference with a fixed address. The patch modifies the logic to alsoallow memory reference whose address is not changed by the statementsinbetween.
In order to do that the only way I can think of is to modify"validate_equiv_mem" to also validate the equivalence of the address,which may consist of a pseudo-register.
Nevertheless, reviews and suggestions to improve the code/explain howto implement it in GIMPLE phase would be appreciated.
Bootstrapped and regression tested on x86_64-pc-linux-gnu.
I think the test passes but there are some spurious failures with somescan-assembler-* tests, looking through it it doesn't appear topessimize the code.
I think this also fix PR 103541 as a side effect, but I'm not sure ifthe generated code is optimal (it loads from the global variabletwice, but then there's no readily usable caller-saved register so youneed an additional memory operation anyway)
Sorry for the big delay with the review, I missed your patch. It isbetter to CC a patch to maintainers of the related code to avoid suchsituation. Fortunately, Jeff Law pointed me out your patch recently. Itis also a good practice to ping the patch if there is no response forone week. Sometimes people do several pings to get an attention fortheir patches.
The patch looks ok to me. The only thing I found is that the test caseshould be in g++.target/i386 dir, not in gcc.target/i386. I would alsoadd -std=c++11 (or something more), as the test in g++.target will berun with different -std options and for -std=c++98 the test will not pass.
Also it is better to test such non-trivial patches on all majortargets. You can use compiler farm for this if you have no ownavailable machines. Otherwise, you should pay attention to newtestsuite regressions on other targets after submitting the patch.
So the patch can be committed with the test change I wrote above.
And thank you to find the opportunity to generate a better code andimplementing it.

Note this patch is causing the compiler to hang on the compile/pr43415.ctestcase on visium-elf. So there's no way for it to go forward withoutsome additional testing and bugfixing.


For the submitter.  If you configure the compiler with:

<path-to-gcc-sources>/configure --target=visium-elf

Then build with

make all-gcc

Then test with
cd gcc; make check-gcc RUNTESTFLAGS=compile.exp=pr43415.c

You should see the failure (after a long wait for the dejagnu timeout tokick in).


jeff

Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

Reply via email to