Ian Bolton wrote: > Ian Bolton wrote: > > Vladimir Makarov wrote: > > > Ian Bolton wrote: > > > > Yesterday, I wrote: > > > > > > > >> BTW, today I have achieved some good results with existing IRA > by > > > doing > > > >> the following: > > > >> > > > >> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out > > > before > > > >> BOTTOM_REGS. My previous hacked version worked by increasing > the > > > >> priority of those that wanted BOTTOM_REGS, so they got first > pick; > > > this > > > >> new version makes them wait their turn, but ensures those with > > > higher > > > >> priority take TOP_CREGS before BOTTOM_REGS. > > > >> > > > >> The analogy, I think, is of giving out meals on an airplane. > Most > > > >> people will eat meat or vegetarian meals, whereas vegetarians > only > > > >> want vegetarian meals. My hacked version was effectively > allowing > > > >> the vegetarians to push to the front of the queue and get their > > > meals; > > > >> this new version works by encouraging the meat eaters to eat the > > > meaty > > > >> meals first, leaving more suitable meals for the vegetarians > > further > > > >> down the plane. > > > >> > > > >> 2) I have forced propagation of allocnos to parent regions with > > the > > > >> following hack in find_allocno_class_costs(): > > > >> > > > >> { > > > >> /* Propagate costs to upper levels in the region > > > >> tree. */ > > > >> parent_a_num = ALLOCNO_NUM (parent_a); > > > >> for (k = 0; k < cost_classes_num; k++) > > > >> COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k] > > > >> += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k]; > > > >> COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost > > > >> += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost; > > > >> /* BEGIN IGB-IRA CHANGE 2 */ > > > >> /* Force total_costs to propagate upwards, by setting > > > >> allocno_costs to be total_costs */ > > > >> for (k = 0; k < cost_classes_num; k++) > > > >> COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k] > > > >> = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]; > > > >> COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost > > > >> = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost; > > > >> /* END IGB-IRA CHANGE 2 */ > > > >> } > > > >> > > > >> I don't know why propagation isn't happening normally, but > > > >> this total_costs hack achieves the same thing for me at the > > > >> moment. Is there any information I could provide to help > > > >> someone tell me why propagation isn't happening? > > > >> > > > > > > > > Good news! I have been able to remove my "total_costs" hack > > > > above by instead commenting out the following line from > ira_build() > > > > in ira-build.c: > > > > > > > > remove_unnecessary_regions (false); > > > > > > > > For my situation at least, this function is preventing the > > > > propagation of useful allocno information from region 1 to > > > > region 0. Without my change, the allocation for a pseudo in > > > > region 0 is not aware of how that pseudo will be used inside > > > > a loop in region 1; the real impact of this is that we need > > > > to then do a register move *inside the loop* into a valid > > > > register class for the instruction in region 1. > > > > > > > > I do not believe this will impact anyone with a regular > > > > register set, but for any architecture where some instructions > > > > can only use a subset of the register bank, I believe that > > > > all regions are always necessary, since cost information > > > > for each allocno is relevant and important. > > > > > > > > I still need to do some more testing in regards this "fix", > > > > but I wanted to put my findings out there as soon as possible > > > > for comment from the experts. > > > > > > > The function (remove_unnecessary_regions) is used to increase RA > > speed > > > by decreasing number of regions. The region is removed if the > > register > > > pressure is not high in the region in other words if the > probability > > to > > > spill pseudos on the region border to improve RA in the region is > > > small. > > > > > > But when the region is removed and correspondingly allocnos (see > > > function remove_unecessary_allocnos) in the region the info > including > > > hard register costs is propagated to the corresponding allocnos in > > the > > > parent region (see function propagate_some_info_from_allocno). > > > > > > I've just checked the code it looks ok to me. Ian, it would be > nice > > if > > > you figure out why the propagation does not happen. As Jeff > wrote, > > > fixing that would be important practically for any target. > > > > (I'm in the middle of a hefty compile at the moment, so I can't do > any > > more debugging yet, but I figured I'd think out loud on this list in > > the > > mean time.) > > > > At first, I thought the problem was that > > propagate_some_info_from_allocno() > > appears to only pass up information about the cover class, as opposed > > to every cost_class (which was what my total_costs hack did). > However, > > I see > > that the propagate_allocno_info() function, which is now being called > > when > > I comment out remove_unnecessary_regions(), also only passes up > > information > > about the cover class, so I don't think it's propagation that's the > > issue. > > > > Looking at ira_build(), I see that there is a call to create_caps() > > just > > after the call to propagate_allocno_info() - both of which were > guarded > > by > > the condition more_one_region_p(), which is only true when I comment > > out > > the call to remove_unnecessary_regions(). Looking at create_caps(), > I > > see > > that a Cap will be created for each of my region 1 allocnos. Looking > > further > > down in the ira_build() function, there is a call to > > ira_build_conflicts(), > > which makes a lot of references to Caps. Perhaps it's the existence > of > > Caps > > that matters? > > > > I see in my benchmark that there are some unfortunate allocnos that > > require > > BOTTOM_REGS, but they only live in Region 1. Am I right in thinking > > that, > > without the caps, there is no way for them to get their voice heard > > during > > the allocation for Region 0? > > > > Maybe, when an irregular register set exists, whether there is high > > pressure > > in the region or not, the Caps should be made to ensure that you > don't > > end > > up with move instructions happening within the loop? > > I've just changed my code to turn back on the > remove_unnecessary_regions > function and ... I fear I have taken us all on a wild goose chase! > > It seems that my alternate REG_ALLOC_ORDER, where TOP_CREGS are given > out > before BOTTOM_REGS, is enough on its own to get the allocation right!
I have discovered a regression in the Huffman benchmark. The new REG_ALLOC_ORDER gives out the upper regs first, as intended. But then one of these upper regs becomes dead and could be re-used for more work, except that it can't be re-used because subsequent instructions want a BOTTOM_REG. As we have to spill and restore for each callee-save register, this costs two instructions; previously, the old REG_ALLOC_ORDER gave a BOTTOM_REG out, which then got re-used for more work when it became dead. I expect this kind of thing will happen anywhere that register pressure is low and lead to regressions all over the place. The question is: how to fix this? I have initially put the REG_ALLOC_ORDER back to how it was and changed the operand constraints in our MD file, so each of the "apathetic" instructions, will either take a 't' (TOP_CREG) or '?b' (BOTTOM_REG). The '?' shows that this alternative is slightly more costly than using 't'. On the benchmark that benefitted the most from the new REG_ALLOC_ORDER, these constraints are almost achieving the same thing. It is only "almost" there because I am struggling with how to show two alternatives for loads and stores, which currently have an 'm' constraint. I want to show that it is slightly more costly to have the address in a 'b', as opposed to a 't', but I can't see how to show this in the .md file. Maybe it isn't possible? Even if I make an alternate memory constraint (e.g. '?k'), I'm not sure how to show that this one only works with BOTTOM_REGS as its base register. Another approach for fixing is to keep my new REG_ALLOC_ORDER and investigate why IRA does not realise it is more costly to adhere to the REG_ALLOC_ORDER (due to having to save/restore one extra register) versus using a BOTTOM_REG instead that could be re-used once the initial use becomes dead. The hard register costs when assigning for the first use don't appear to distinguish between any of the callee-save registers; is it possible for IRA to realise it will be able to re-use a BOTTOM_REG (and eliminate a save/restore) and thereby increase the cost for using an upper reg? The final approach I have thought of is for IRA to dynamically select the REG_ALLOC_ORDER based on how many callee-save registers it thinks it will need. For example, if it thinks it doesn't need all of them, it can use the original REG_ALLOC_ORDER, so we can limit the number of save/restore instructions; if it is going to use all of them, then we need to make sure we hold back the BOTTOM_REGS for those that need them. Perhaps I could force an extra pass so that IRA tries both of the REG_ALLOC_ORDERs and keeps the one that did the best? As always, your thoughts and advice would be much appreciated!