2011/10/25 Georg-Johann Lay <a...@gjlay.de>: > With the following, small C test program > > > typedef struct > { > unsigned char a, b, c, d; > } s_t; > > unsigned char func1 (s_t *x, s_t *y, s_t *z) > { > unsigned char s = 0; > s += x->a; > s += y->a; > s += z->a; > > s += x->b; > s += y->b; > s += z->b; > > s += x->c; > s += y->c; > s += z->c; > > return s; > } > > there is a frame pointer set up for no apparent reason. > > The machine for which this code is compiled for (AVR) has just few pointer > registers and taking away one of them to use it as frame pointer leads to > severe performance degradation in many real-world programs: moving from/to > memory is more expensive than movon around registers, setting up a frame is > expensive and taking away 1 of 2 address registers is expensive. > > What I tried and what did not fix it: > > - increase targetm.memory_move_cost (up to unsane value) > - play around with targetm.class_likely_spilled_p > > The program is compiled with > > $ avr-gcc in.c -S -Os -fdump-rtl-ira-details -fdump-rtl-postreload-details > -mmcu=avr4 -mstrict-X > > with avr-gcc from current trunk SVN r180399. > > > The issue is that AVR has only 3 pointer registers X, Y, and Z with the > following addressing capabilities: > > *X, *X++, *--X (R27:R26, call-clobbered) > *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer) > *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered) > > Older version of the compiler prior to 4.7 trunk r179993 allowed a fake > addressing mode *(X+const) and emulated it by emitting appropriate > instructions > sequence like > > X = X + const > r = *X > X = X - const > > which was only a rare corner case in the old register allocator, but in the > new > allocator this sequence is seen very often leading to code bloat of +50% for > some real-world functions. > > This is the reason why the command line option -mstrict-X has been added to > the > AVR backend, see PR46278. > > This option denies fake *(X+const) addressing but leads to the mentioned > spills > from register allocator and to code even worse as compared to without setting > -mstrict-X, i.e. register allocator sabotages a smart usage of the address > registers. > > All I see is that reload1.c:alter_reg() generates the spill because > ira_conflicts_p is true. > > With the option -morder1 turn on (affects ADJUST_REG_ALLOC_ORDER) there is > still a frame set up even though never accessed. > > Can anyone give me some advice how to proceed with this issue? > > Can be said if this is a target issue or IRA/reload flaw?
It's not a costs related problem. I think that I can explain a problem. I think that it's an IRA bug. > Spilling for insn 11. > Using reg 26 for reload 0 > Spilling for insn 17. > Using reg 30 for reload 0 > Spilling for insn 23. > Using reg 30 for reload 0 > Try Assign 60(a6), cost=16000 Wrong thing starts here... ira-color.c:4120 allocno_reload_assign (a, forbidden_regs); > changing reg in insn 2 > changing reg in insn 9 > changing reg in insn 13 > changing reg in insn 19 > Assigning 60(freq=4000) a new slot 0 > Register 60 now on stack. Call trace: allocno_reload_assign() -> assign_hard_reg() -> get_conflict_profitable_regs() The `get_conflict_profitable_regs' calculates wrong `profitable_regs[1]' (Special for Vladimir) AVR is an 8 bits microcontroller. The AVR has only 3 pointer registers X, Y, and Z with the following addressing capabilities: *X, *X++, *--X (R27:R26, call-clobbered) *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer) *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered) Also, all modes larger than 8 bits should start in an even register. So, `get_conflict_profitable_regs' trying to calculate two arrays: - profitable_regs[0] for first word of register 60(a6) - profitable_regs[1] for second word of register 60(a6) Values of `profitable_regs': (gdb) p print_hard_reg_set (stderr,profitable_regs[0] , 01) 0-2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 $63 = void (gdb) p print_hard_reg_set (stderr,profitable_regs[1] , 01) 0-2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 They are equal ! It's wrong because second word of register 60(a6) must be allocated to odd register. This is a wrong place in `get_conflict_profitable_regs': ... nwords = ALLOCNO_NUM_OBJECTS (a); for (i = 0; i < nwords; i++) { obj = ALLOCNO_OBJECT (a, i); COPY_HARD_REG_SET (conflict_regs[i], OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)); if (retry_p) { COPY_HARD_REG_SET (profitable_regs[i], reg_class_contents[ALLOCNO_CLASS (a)]); AND_COMPL_HARD_REG_SET (profitable_regs[i], ira_prohibited_class_mode_regs [ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]); -------------------------------------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^ } ALLOCNO_MODE (a) is a right mode for first word (word = 8bits register) But it's wrong mode for second word of allocno. Even more, ALLOCNO_MODE (a) is a right mode only for whole allocno. If we want to spill/load/store separate parts(IRA objects) of allocno we must use mode of each part(object). `ira_prohibited_class_mode_regs' derived only from HARD_REGNO_MODE_OK. So, the second word of 60(a6) permitted to any register after first word of 60(a6). For AVR: profitable_regs[1] = profitable_regs[0] << 1 Also, I have a question about the following fields of `ira_allocno': /* The number of objects tracked in the following array. */ int num_objects; /* An array of structures describing conflict information and live ranges for each object associated with the allocno. There may be more than one such object in cases where the allocno represents a multi-word register. */ ira_object_t objects[2]; --------------------------^^^^^ The SImode for AVR consists of 4 words, but only 2 objects in allocno structure. Is this right ? Denis.