Vladimir Makarov wrote: > On 02/15/2012 09:21 AM, Georg-Johann Lay wrote: >> This is a question on SUBREGs generated by lower-subreg.c and whether >> register >> allocator is supposed to handle them efficiently. >> >> Suppose the following small function compiled for AVR. >> Remember AVR is 8-bit machine with int = HImode and UNITS_PER_WORD = 1. >> >> int add (int val) >> { >> return val + 1; >> } >> >> The addition can be performed in one insn; val and return value are >> passed in >> HI:24 as you can see in .ira dump: >> >> >> (insn 6 3 19 2 (parallel [ >> (set (reg:HI 45) >> (plus:HI (reg:HI 24 r24 [ val ]) >> (const_int 1 [0x1]))) >> (clobber (scratch:QI)) >> ]) add.c:3 42 {addhi3_clobber} >> (expr_list:REG_DEAD (reg:HI 24 r24 [ val ]) >> (nil))) >> >> (insn 19 6 20 2 (set (reg:QI 24 r24) >> (subreg:QI (reg:HI 45) 0)) add.c:4 18 {movqi_insn} >> (nil)) >> >> (insn 20 19 14 2 (set (reg:QI 25 r25 [+1 ]) >> (subreg:QI (reg:HI 45) 1)) add.c:4 18 {movqi_insn} >> (expr_list:REG_DEAD (reg:HI 45) >> (nil))) >> >> (insn 14 20 0 2 (use (reg/i:HI 24 r24)) add.c:4 -1 >> (nil)) >> >> IRA writes: >> >> Pushing a0(r45,l0)(cost 0) >> Popping a0(r45,l0) -- assign reg 18 >> Disposition: >> 0:r45 l0 18 >> >> i.e. it assigns pseudo HI:45 to hard register HI:18 and thus causes >> inefficient >> code because it happily moves values around without need. >> >> .reload generates additional move insns to satisfy the constraints of >> addhi3 >> which are basically "=r, %0, rn" i.e. addition is a 2-operand insn >> where op0 >> and op1 must be in the same hard register: >> >> (insn 23 3 6 2 (set (reg:HI 18 r18 [45]) >> (reg:HI 24 r24 [ val ])) add.c:3 22 {*movhi} >> (nil)) >> >> (insn 6 23 19 2 (parallel [ >> (set (reg:HI 18 r18 [45]) >> (plus:HI (reg:HI 18 r18 [45]) >> (const_int 1 [0x1]))) >> (clobber (scratch:QI)) >> ]) add.c:3 42 {addhi3_clobber} >> (nil)) >> >> (insn 19 6 20 2 (set (reg:QI 24 r24) >> (reg:QI 18 r18 [45])) add.c:4 18 {movqi_insn} >> (nil)) >> >> (insn 20 19 14 2 (set (reg:QI 25 r25 [+1 ]) >> (reg:QI 19 r19 [+1 ])) add.c:4 18 {movqi_insn} >> (nil)) >> >> >> However, the machine could just as well do the addition in HI:24 >> directly like so: >> >> >> (parallel [(set (reg:HI 24 r24) >> (plus:HI (reg:HI 24) >> (const_int 1))) >> (clobber (scratch:QI))]) {addhi3_clobber} >> >> >> Question: Is IRA supposed to detect SUBREGs like above and avoid code >> bloat? >> Sequences like >> >> >> (insn 19 6 20 2 (set (reg:QI 24 r24) >> (subreg:QI (reg:HI 45) 0)) add.c:4 18 {movqi_insn} >> (nil)) >> >> (insn 20 19 14 2 (set (reg:QI 25 r25 [+1 ]) >> (subreg:QI (reg:HI 45) 1)) add.c:4 18 {movqi_insn} >> (expr_list:REG_DEAD (reg:HI 45) >> (nil))) >> >> obviously generate some early-clobber situation for IRA that avoids >> HI:45 to be >> allocated to HI:24. >> >> Is IRA a school book implementation that does not know anything about >> SUBREGs? > > No, it is not a school book implementation. > >> Or should IRA be smart enough to detect and allocate SUBREGs >> efficiently by some "subreg fusion" mechanism? > > No, it is not smart enough. > > IRA deals well with subregs of multi-register pseudos but not with > subregs of one-register pseudos. > > By the way, the old register allocator did not deal with subregs at all. >> The code above is just a small example to show the problem, but the >> issue also >> occurs with more complex code and not only for return and parameter >> registers. > > Thanks for reporting this. > I might be work on this. But I don't know when I can start. > This platform is not on my high priority list.
Thanks for improving the situation. It's already good news to hear it is feasible and not a "no-go because too much rework in register allocator". Filed it as PR52278 for reference: http://gcc.gnu.org/PR52278 Johann