Ian Bolton wrote:
> Ian Bolton wrote:
> > Vladimir Makarov wrote:
> > > Ian Bolton wrote:
> > > > Yesterday, I wrote:
> > > >
> > > >> BTW, today I have achieved some good results with existing IRA
> by
> > > doing
> > > >> the following:
> > > >>
> > > >> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out
> > > before
> > > >> BOTTOM_REGS.  My previous hacked version worked by increasing
> the
> > > >> priority of those that wanted BOTTOM_REGS, so they got first
> pick;
> > > this
> > > >> new version makes them wait their turn, but ensures those with
> > > higher
> > > >> priority take TOP_CREGS before BOTTOM_REGS.
> > > >>
> > > >> The analogy, I think, is of giving out meals on an airplane.
> Most
> > > >> people will eat meat or vegetarian meals, whereas vegetarians
> only
> > > >> want vegetarian meals.  My hacked version was effectively
> allowing
> > > >> the vegetarians to push to the front of the queue and get their
> > > meals;
> > > >> this new version works by encouraging the meat eaters to eat the
> > > meaty
> > > >> meals first, leaving more suitable meals for the vegetarians
> > further
> > > >> down the plane.
> > > >>
> > > >> 2) I have forced propagation of allocnos to parent regions with
> > the
> > > >> following hack in find_allocno_class_costs():
> > > >>
> > > >> {
> > > >>   /* Propagate costs to upper levels in the region
> > > >>      tree.  */
> > > >>   parent_a_num = ALLOCNO_NUM (parent_a);
> > > >>   for (k = 0; k < cost_classes_num; k++)
> > > >>     COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
> > > >>       += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
> > > >>   COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
> > > >>     += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
> > > >>   /* BEGIN IGB-IRA CHANGE 2 */
> > > >>   /* Force total_costs to propagate upwards, by setting
> > > >>      allocno_costs to be total_costs */
> > > >>   for (k = 0; k < cost_classes_num; k++)
> > > >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
> > > >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
> > > >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
> > > >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
> > > >>   /* END IGB-IRA CHANGE 2 */
> > > >> }
> > > >>
> > > >> I don't know why propagation isn't happening normally, but
> > > >> this total_costs hack achieves the same thing for me at the
> > > >> moment.  Is there any information I could provide to help
> > > >> someone tell me why propagation isn't happening?
> > > >>
> > > >
> > > > Good news!  I have been able to remove my "total_costs" hack
> > > > above by instead commenting out the following line from
> ira_build()
> > > > in ira-build.c:
> > > >
> > > >   remove_unnecessary_regions (false);
> > > >
> > > > For my situation at least, this function is preventing the
> > > > propagation of useful allocno information from region 1 to
> > > > region 0.  Without my change, the allocation for a pseudo in
> > > > region 0 is not aware of how that pseudo will be used inside
> > > > a loop in region 1; the real impact of this is that we need
> > > > to then do a register move *inside the loop* into a valid
> > > > register class for the instruction in region 1.
> > > >
> > > > I do not believe this will impact anyone with a regular
> > > > register set, but for any architecture where some instructions
> > > > can only use a subset of the register bank, I believe that
> > > > all regions are always necessary, since cost information
> > > > for each allocno is relevant and important.
> > > >
> > > > I still need to do some more testing in regards this "fix",
> > > > but I wanted to put my findings out there as soon as possible
> > > > for comment from the experts.
> > > >
> > > The function (remove_unnecessary_regions) is used to increase RA
> > speed
> > > by decreasing number of regions.  The region is removed if the
> > register
> > > pressure is not high in the region in other words if the
> probability
> > to
> > > spill pseudos on the region border to improve RA in the region is
> > > small.
> > >
> > > But when the region is removed and correspondingly allocnos (see
> > > function remove_unecessary_allocnos) in the region the info
> including
> > > hard register costs is propagated to the  corresponding allocnos in
> > the
> > > parent region (see function propagate_some_info_from_allocno).
> > >
> > > I've just checked the code it looks ok to me.  Ian, it would be
> nice
> > if
> > > you figure out why the propagation does not happen.   As Jeff
> wrote,
> > > fixing that would be important practically for any target.
> >
> > (I'm in the middle of a hefty compile at the moment, so I can't do
> any
> > more debugging yet, but I figured I'd think out loud on this list in
> > the
> > mean time.)
> >
> > At first, I thought the problem was that
> > propagate_some_info_from_allocno()
> > appears to only pass up information about the cover class, as opposed
> > to every cost_class (which was what my total_costs hack did).
> However,
> > I see
> > that the propagate_allocno_info() function, which is now being called
> > when
> > I comment out remove_unnecessary_regions(), also only passes up
> > information
> > about the cover class, so I don't think it's propagation that's the
> > issue.
> >
> > Looking at ira_build(), I see that there is a call to create_caps()
> > just
> > after the call to propagate_allocno_info() - both of which were
> guarded
> > by
> > the condition more_one_region_p(), which is only true when I comment
> > out
> > the call to remove_unnecessary_regions().  Looking at create_caps(),
> I
> > see
> > that a Cap will be created for each of my region 1 allocnos.  Looking
> > further
> > down in the ira_build() function, there is a call to
> > ira_build_conflicts(),
> > which makes a lot of references to Caps.  Perhaps it's the existence
> of
> > Caps
> > that matters?
> >
> > I see in my benchmark that there are some unfortunate allocnos that
> > require
> > BOTTOM_REGS, but they only live in Region 1.  Am I right in thinking
> > that,
> > without the caps, there is no way for them to get their voice heard
> > during
> > the allocation for Region 0?
> >
> > Maybe, when an irregular register set exists, whether there is high
> > pressure
> > in the region or not, the Caps should be made to ensure that you
> don't
> > end
> > up with move instructions happening within the loop?
> 
> I've just changed my code to turn back on the
> remove_unnecessary_regions
> function and ... I fear I have taken us all on a wild goose chase!
> 
> It seems that my alternate REG_ALLOC_ORDER, where TOP_CREGS are given
> out
> before BOTTOM_REGS, is enough on its own to get the allocation right!

I have discovered a regression in the Huffman benchmark.  The new
REG_ALLOC_ORDER gives out the upper regs first, as intended.  But then
one of these upper regs becomes dead and could be re-used for more work,
except that it can't be re-used because subsequent instructions want
a BOTTOM_REG.  As we have to spill and restore for each callee-save
register, this costs two instructions; previously, the old REG_ALLOC_ORDER
gave a BOTTOM_REG out, which then got re-used for more work when it became
dead.  I expect this kind of thing will happen anywhere that register
pressure is low and lead to regressions all over the place.

The question is: how to fix this?  I have initially put the REG_ALLOC_ORDER
back to how it was and changed the operand constraints in our MD file,
so each of the "apathetic" instructions, will either take a 't' (TOP_CREG)
or '?b' (BOTTOM_REG).  The '?' shows that this alternative is slightly more
costly than using 't'.  On the benchmark that benefitted the most from
the new REG_ALLOC_ORDER, these constraints are almost achieving the same
thing.  It is only "almost" there because I am struggling with how to show
two alternatives for loads and stores, which currently have an 'm'
constraint.

I want to show that it is slightly more costly to have the address in a 'b',
as opposed to a 't', but I can't see how to show this in the .md file.  Maybe
it isn't possible?  Even if I make an alternate memory constraint (e.g. '?k'),
I'm not sure how to show that this one only works with BOTTOM_REGS as its
base register.

Another approach for fixing is to keep my new REG_ALLOC_ORDER and
investigate why IRA does not realise it is more costly to adhere to the
REG_ALLOC_ORDER (due to having to save/restore one extra register) versus
using a BOTTOM_REG instead that could be re-used once the initial use becomes
dead.  The hard register costs when assigning for the first use don't appear
to distinguish between any of the callee-save registers; is it possible for
IRA to realise it will be able to re-use a BOTTOM_REG (and eliminate a
save/restore) and thereby increase the cost for using an upper reg?

The final approach I have thought of is for IRA to dynamically select
the REG_ALLOC_ORDER based on how many callee-save registers it thinks it
will need.  For example, if it thinks it doesn't need all of them, it can
use the original REG_ALLOC_ORDER, so we can limit the number of save/restore
instructions; if it is going to use all of them, then we need to make sure
we hold back the BOTTOM_REGS for those that need them.  Perhaps I could
force an extra pass so that IRA tries both of the REG_ALLOC_ORDERs and keeps
the one that did the best?

As always, your thoughts and advice would be much appreciated!

Reply via email to