On 04/28/2012 11:31 AM, Bernd Schmidt wrote:
This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function.

First, the patch sets the existing "fn spec" attribute for memcpy/memmove. This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for communicating to caller-save that the register can be restored cheaply.

The optimization only triggers if the argument is passed in a register, which should be the case in the majority of sane ABIs. The effect on the new testcase:

    pushq    %rbx          |    subq    $8, %rsp
    movslq    %edx, %rdx        movslq    %edx, %rdx
    movq    %rdi, %rbx <
    call    memcpy            call    memcpy
    movq    %rbx, %rax      |    addq    $8, %rsp
    popq    %rbx <
    ret                    ret

Bernd, sorry for some delay.  I needed to think about the patch.

It is pretty interesting and original idea. My only major objection to the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Otherwise, I am afraid you will have a degradation in many cases instead of improvement.

IRA is a regional allocator. The first it tries to do coloring in whole function (seeing a whole picture), then it tries to improve allocation in subregions. When you treat ALLOCNO_CHEAP_CALLS_CROSSED_NUM not accumulated (it means not taking subregions into account) you mislead allocation in the region containing subregions.

For example, if the single call is in a subregion, ALLOCNO_CHEAP_CALLS_CROSSED_NUM for the subregion allocno will be 1 but in whole program allocno will be 0. RA in whole function will tend to allocate callee-saved hard register and RA in the subregion will tend to allocate caller-saved hard register. That will increase a possibility to create additional shuffle insns on the subregion borders and as consequence will degrade the code.

I don't expect that this micro-optimization improves SPEC2000, but it will improve some tests. So it is good to have it. It would be really interesting to see the optimization impact on SPEC2000. I think I'll do it for myself in a week.

So IRA part of the patch is ok for me if you modify treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the values from the corresponding allocnos from its sub-regions).

Reply via email to