On 04/28/2012 11:31 AM, Bernd Schmidt wrote:
This patch allows us to recognize that even if the argument to memcpy
lives across the call, we can allocate it to a call-used register by
reusing the return value of the function.
First, the patch sets the existing "fn spec" attribute for
memcpy/memmove. This is translated to a new form of
CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is
recognized by IRA to adjust costs, and for communicating to
caller-save that the register can be restored cheaply.
The optimization only triggers if the argument is passed in a
register, which should be the case in the majority of sane ABIs. The
effect on the new testcase:
pushq %rbx | subq $8, %rsp
movslq %edx, %rdx movslq %edx, %rdx
movq %rdi, %rbx <
call memcpy call memcpy
movq %rbx, %rax | addq $8, %rsp
popq %rbx <
ret ret
Bernd, sorry for some delay. I needed to think about the patch.
It is pretty interesting and original idea. My only major objection to
the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I
think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Otherwise,
I am afraid you will have a degradation in many cases instead of
improvement.
IRA is a regional allocator. The first it tries to do coloring in whole
function (seeing a whole picture), then it tries to improve allocation
in subregions. When you treat ALLOCNO_CHEAP_CALLS_CROSSED_NUM not
accumulated (it means not taking subregions into account) you mislead
allocation in the region containing subregions.
For example, if the single call is in a subregion,
ALLOCNO_CHEAP_CALLS_CROSSED_NUM for the subregion allocno will be 1 but
in whole program allocno will be 0. RA in whole function will tend to
allocate callee-saved hard register and RA in the subregion will tend to
allocate caller-saved hard register. That will increase a possibility
to create additional shuffle insns on the subregion borders and as
consequence will degrade the code.
I don't expect that this micro-optimization improves SPEC2000, but it
will improve some tests. So it is good to have it. It would be really
interesting to see the optimization impact on SPEC2000. I think I'll do
it for myself in a week.
So IRA part of the patch is ok for me if you modify treatment of
ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for
ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the
values from the corresponding allocnos from its sub-regions).