On 7/15/24 7:53 AM, Vladimir Makarov wrote:

On 6/14/24 07:10, user202...@protonmail.com wrote:
This patch was inspired from PR 110137. It reduces the amount of stack spilling by ensuring that more values are constant across a pure function call.

It does not add any new flag; rather, it makes the optimizer generate more optimal code.

For the added test file, the change is the following. As can be seen, the number of memory operations is cut in half (almost, because rbx = rdi also need to be saved in the "after" version).

Before:

    _Z2ggO7MyClass:
    .LFB653:
        .cfi_startproc
        sub    rsp, 72
        .cfi_def_cfa_offset 80
        movdqu    xmm1, XMMWORD PTR [rdi]
        movdqu    xmm0, XMMWORD PTR [rdi+16]
        movaps    XMMWORD PTR [rsp+16], xmm1
        movaps    XMMWORD PTR [rsp], xmm0
        call    _Z1fv
        movdqa    xmm1, XMMWORD PTR [rsp+16]
        movdqa    xmm0, XMMWORD PTR [rsp]
        lea    rdx, [rsp+32]
        movaps    XMMWORD PTR [rsp+32], xmm1
        movaps    XMMWORD PTR [rsp+48], xmm0
        add    rsp, 72
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

After:

    _Z2ggO7MyClass:
    .LFB653:
        .cfi_startproc
        push    rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        mov    rbx, rdi
        sub    rsp, 32
        .cfi_def_cfa_offset 48
        call    _Z1fv
        movdqu    xmm0, XMMWORD PTR [rbx]
        movaps    XMMWORD PTR [rsp], xmm0
        movdqu    xmm0, XMMWORD PTR [rbx+16]
        movaps    XMMWORD PTR [rsp+16], xmm0
        add    rsp, 32
        .cfi_def_cfa_offset 16
        pop    rbx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

As explained in PR 110137, the reason I modify the RTL pass instead of the GIMPLE pass is that currently the code that handle the optimization is in the IRA.

The optimization involved is: rewrite

definition: a = something;
...
use a;

to move the definition statement right before the use statement, provided none of the statements inbetween modifies "something".

The existing code only handle the case where "something" is a memory reference with a fixed address. The patch modifies the logic to also allow memory reference whose address is not changed by the statements inbetween.

In order to do that the only way I can think of is to modify "validate_equiv_mem" to also validate the equivalence of the address, which may consist of a pseudo-register.

Nevertheless, reviews and suggestions to improve the code/explain how to implement it in GIMPLE phase would be appreciated.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

I think the test passes but there are some spurious failures with some scan-assembler-* tests, looking through it it doesn't appear to pessimize the code.

I think this also fix PR 103541 as a side effect, but I'm not sure if the generated code is optimal (it loads from the global variable twice, but then there's no readily usable caller-saved register so you need an additional memory operation anyway)

Sorry for the big delay with the review, I missed your patch.  It is better to CC a patch to maintainers of the related code to avoid such situation.  Fortunately, Jeff Law pointed me out your patch recently. It is also a good practice to ping the patch if there is no response for one week.  Sometimes people do several pings to get an attention for their patches.

The patch looks ok to me.  The only thing I found is that the test case should be in g++.target/i386 dir, not in gcc.target/i386.  I would also add -std=c++11 (or something more), as the test in g++.target will be run with different -std options and for -std=c++98 the test will not pass.

Also it is better to test such non-trivial patches on all major targets.  You can use compiler farm for this if you have no own available machines.  Otherwise, you should pay attention to new testsuite regressions on other targets after submitting the patch.

So the patch can be committed with the test change I wrote above.

And thank you to find the opportunity to generate a better code and implementing it.
Note this patch is causing the compiler to hang on the compile/pr43415.c testcase on visium-elf. So there's no way for it to go forward without some additional testing and bugfixing.

For the submitter.  If you configure the compiler with:

<path-to-gcc-sources>/configure --target=visium-elf

Then build with

make all-gcc

Then test with
cd gcc; make check-gcc RUNTESTFLAGS=compile.exp=pr43415.c

You should see the failure (after a long wait for the dejagnu timeout to kick in).

jeff


Reply via email to