On 03/08/2011 06:36 AM, Georg-Johann Lay wrote:
Georg-Johann Lay schrieb:
In current trunk (r170704), 4.4-branch and 4.5-branch I observe the
following optimization issue in IRA: It saves regs in the frame
instead of in callee-saved registers which would be much smarter.
In the following C source, foo2 is compiled as desired (push/pop r17
to save r24). In foo1 and foo3 r24 is saved in the frame. The old
lreg/greg allocator of 4.3-branch generates fine code for all functions.
Saving a reg in the frame should only be done if running out of hard
registers because setting up frame(pointer) and accessing frame is
very expensive on avr.
Maybe someone can give me a hint what's going wrong.
gcc configured
/gnu/source/gcc.gnu.org/trunk/configure --target=avr --prefix=...
--enable-languages=c,c++ --disable-libssp --disable-libada
--disable-nls --disable-shared
and sources compiled
with -Os -mmcu=atmega8 -c -dp -da -fira-verbose=100
/*****************************************************/
void bar0 (void);
void bar1 (char);
void foo1 (char x)
{
bar0();
bar1(x);
}
char foo2 (char x)
{
bar1(x);
return x;
}
char foo3 (char x)
{
bar0();
return x;
}
/*****************************************************/
FYI, I attached IRA dumps and asm output
As far I can see target avr gives appropriate costs for memory and
register moves.
IRA printout is as follows:
Returning memory move costs 4 times higher (8 instead of 2) for QI
memory moves, IRA/reload generates code as expected.
Is there any other way than lying about the costs?
IRA doues not take into account costs implied by generating new stack
slots and setting up frame pointer. AFAIK there is no hook to
influence that. How can a target describe costs generated by setting
up stack slots and accessing them?
First of all, defining equal costs for moving into memory and into
register (2 in both cases) is wrong way to direct IRA.
In case of test1, pseudo 42 is in two insns involving hard register 24.
Therefore its cost is decreased in ira-costs.c. After that it is
increased on the same value because the pseudo intersects one call and
the cost of ld/st (for save/restore) is the same as for moving its value
into a general register. Because r24 is the first in the hard register
allocation order, IRA chooses r24.
So making ld/st a bit more costly than hard register moving would solve
the problem. Unfortunately, it does not happens because the two move
insns involving p42 and hard register 24 are taken into account twice
(once in ira-costs.c and another one ira-conflicts.c:process_regs_for copy).
The following patch would solve the problem. I'll submit it when gcc is
in stage 1. The patch looks harmless but changes in ira-costs.c
frequently trigger reload failures. Therefore the patch needs thorough
testing and stage1 is best time to do it.
Thanks for pointing the problem. It helped to find some cost
calculation pitfall which was probably introduced by an IRA change in
ira-costs.c inconsistent with the code in ira-conflicts.c.
Index: ira-conflicts.c
===================================================================
--- ira-conflicts.c (revision 170786)
+++ ira-conflicts.c (working copy)
@@ -432,8 +432,7 @@ process_regs_for_copy (rtx reg1, rtx reg
rclass = REGNO_REG_CLASS (allocno_preferenced_hard_regno);
mode = ALLOCNO_MODE (a);
cover_class = ALLOCNO_COVER_CLASS (a);
- if (only_regs_p && insn != NULL_RTX
- && reg_class_size[rclass] <= (unsigned) CLASS_MAX_NREGS (rclass, mode))
+ if (only_regs_p && insn != NULL_RTX)
/* It is already taken into account in ira-costs.c. */
return false;
index =
ira_class_hard_reg_index[cover_class][allocno_preferenced_hard_regno];