> -----Original Message----- > From: Vladimir Makarov [mailto:vmaka...@redhat.com] > Sent: 08 September 2013 17:51 > To: Matthew Fortune > Cc: gcc@gcc.gnu.org; ber...@codesourcery.com > Subject: Re: mips16 LRA vs reload - Excess reload registers > > On 13-08-23 5:26 AM, Matthew Fortune wrote: > > Hi Vladimir, > > > > I've been working on code size improvements for mips16 and have been > pleased to see some improvement when switching to use LRA instead of > classic reload. At the same time though I have also seen some differences > between reload and LRA in terms of how efficiently reload registers are > reused. > > > > The trigger for LRA to underperform compared with classic reload is when > IRA allocates inappropriate registers and thus puts a lot of stress on > reloading. Mips16 showed this because it can only access a small subset of > the MIPS registers for general instructions. The remaining MIPS registers are > still available as they can be accessed by some special instructions and used > via move instructions as temporaries. In the current mips16 backend, > register move costings lead IRA to determine that although the preferred > class for most pseudos is M16_REGS, the allocno class ends up as GR_REGS. > IRA then resorts to allocating registers outside of M16_REGS more and more > as register pressure increases, even though this is fairly stupid. > > > > When using classic reload the inappropriate register allocations are > effectively reverted as the reload pseudos that get invented tend to all > converge on the same hard register completely removing the original > pseudo. For LRA the reloads tend to diverge and different hard registers are > assigned to the reload pseudos leaving us with two new pseudos and the > original. Two extra move instructions and two extra hard registers used. > While I'm not saying it is LRA's fault for not fixing this situation > perfectly it > does seem that classic reload is better at it. > > > > I have found a potential solution to the original IRA register allocation > problem but I think there may still be something to address in LRA to > improve this scenario anyway. My proposed solution to the IRA problem for > mips16 is to adjust register move costings such that the total of moving > between M16_REGS and GR_REGS and back is more expensive than memory, > but moving from GR_REGS to GR_REGS is cheaper than memory (even > though this is a bit weird as you have to go through an M16_REG to move > from one GR_REG to another GR_REG). > > > > GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a > candidate pressure class but the additional cost for M16->GR->M16 means > that IRA does not use GR_REGS as an alternative class and the allocno class is > just M16_REGS as desired. This feels a bit like a hack but may be the best > solution. The hard register costings used when allocating registers from an > allocno class just don't seem to be strong enough to prevent poor register > allocation in this case, I don't know if the hard register costs are supposed > to > resolve this issue or if they are just about fine tuning. > > > > With the fix in place, LRA outperforms classic reload which is fantastic! > > > > I have a small(ish) test case for this and dumps for IRA, LRA and classic > reload along with the patch to enable LRA for mips16. I can also provide the > fix to register costing that effectively avoids/hides this problem for mips16. > Should I post them here or put them in a bugzilla ticket? > > > > Any advice on which area needs fixing would be welcome and I am quite > happy to work on this given some direction. I suspect these issues are > relevant for any architecture that is not 100% orthogonal which is pretty > much all and particularly important for compressed instruction sets. > > > Sorry again than I did not find time to answer you earlier, Matt. > > Your hack could work. And I guess it is always worth to post the patch for > public with examples of the generated code before and after the patch. > May be some collective mind helps to figure out more what to do with the > patch.
I'll post that shortly. > But I guess there is still a thing to do. After constraining allocation only > to > MIPS16 regs we still could use non-MIPS16 GR_REGS for storing values of > less frequently used pseudos (as storing them in non-MIPS16 GR_REGS is > better than in memory). E.g. x86-64 LRA can use SSE regs for storing values > of less frequently used pseudos requiring GENERAL_REGS. > Please look at spill_class target hook and its implementation for x86-64. I have indeed implemented that for mips16 and found that not only does it help to enable the use of non-mips16 registers as spill_class registers but including the mips16 call clobbered registers is also worthwhile. It seems that the spill_class logic is able to find some instances where spilled pseudos could actually have been colored and effectively eliminates the reload. My original post was trying to point out an instance where LRA is not performing as well as reload. Although I can avoid this for mips16 it may well occur in other circumstances but not be as noticeable. Is this something worth pursuing? Regards, Matthew