https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081

--- Comment #13 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #10)
> (In reply to Hongtao Liu from comment #9)
> > (In reply to Hongtao Liu from comment #8)
> > > (In reply to H.J. Lu from comment #7)
> > > > Created attachment 60350 [details]
> > > > ira: Don't increase callee-saved register cost by 1000x
> > > 
> > > NOTE, r15-1619-g3b9b8d6cfdf593 improved 500.perlbench_r on many different
> > > platforms, let me help verify the patch with SPEC2017.
> > 
> > There're 5% regression on alderlake for 511.povray_r.
> > With the patch, there're more PUSH/POPs for callee saved registers.(Those
> > PUSH/POPs  have been eliminated by  r15-1619-g3b9b8d6cfdf593)
> 
> We need testcases to show that.  Without them, we can't be sure that the
> improvement won't go away.
I think the testcase in PR111673 demonstrates it

int f(int);

int advance(int dz)
{
    if (dz > 0)
        return (dz + dz) * dz;
    else
        return dz * f(dz);
}


Before r15-1619-g3b9b8d6cfdf593

advance(int):
        push    rbx
        mov     ebx, edi
        test    edi, edi
        jle     .L2
        imul    ebx, edi
        lea     eax, [rbx+rbx]
        pop     rbx
        ret
.L2:
        call    f(int)
        imul    eax, ebx
        pop     rbx
        ret

After

 advance(int):
        test    edi, edi
        jle     .L2
        imul    edi, edi
        lea     eax, [rdi+rdi]
        ret
.L2:
        sub     rsp, 24
        mov     DWORD PTR [rsp+12], edi
        call    f(int)
        imul    eax, DWORD PTR [rsp+12]
        add     rsp, 24
        ret

Unlike testcase in #c6(call in both if and else branch), there's no call in if
branch, it's not optimal to push  rbx at the entry of the function, it can be
sinked to else branch(as sub + mov). When jle     .L2 is not taken, it can save
one push instruction. And that's why 511.povray_r is improved.

Reply via email to