https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #17 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> On “CALL_FREQ grows much quicker than BB_FREQ”: for r104, the
> ALLOCNO_FREQ ought in principle to be fixed for a given loop iteration
> count.  It shouldn't grow or shrink based on the value of SPILLED.
> That's because every execution of the loop body involves exactly one
> reference to r104.  SPILLED specifies the probability that that single
> reference is the “call” use rather than the “non-call” use, but it doesn't
> change the total number of references per iteration.
> 
> So I think the only reason we see the different ALLOCNO_FREQs in:
> 
>    ALLOCNO_FREQ 989, …
> 
> vs:
> 
>    ALLOCNO_FREQ 990, …
> 
> is round-off error.  If the values had more precision, I think we'd
> have a fixed ALLOCNO_FREQ and a varying ALLOCNO_CALL_FREQ.

yeah, that's plausible, as far as I can tell the FREQ are always scaled by
REG_FREQ_FROM_EDGE_FREQ into [0, BB_FREQ_MAX] and that indeed does an
integer division.  The general problem is that the IPA frequences don't
really seem to have any bounded range and so it always needs to scale.

So I think you're always going to have this error one way or another
which may or may not work to your advantage on any given program.

Maybe we need a way to be a bit more tolerant of this rounding error
instead?

> > Instead I've chosen a middle ground here (same as yours but done in
> > ira_tune_allocno_costs instead), which is to store and load only inside
> > the loop, but to do so only in the BB which contains the call.
> I don't think you were saying otherwise, but just FTR: I wasn't
> proposing a solution, I was just describing a hack.  It seemed
> to me like IRA was making the right decision for r104 in isolation,
> for the given SPILLED value and target costs.  My hack to force
> an allocation for r104 made things worse.

Ah ok, fair enough :)

> 
> > > which is cheaper than both the current approaches.  We don't do that
> > > optimisation yet though, so the current costing seems to reflect what we
> > > currently generate.
> > 
> > In many (if not most) Arches stores are significantly cheaper than the loads
> > though. So the store before the call doesn't end up making that much of a
> > difference, but yes it adds up if you have many of them.
> Yeah.  Could we fix the problem that way instead?  The only reason IRA is
> treating loads and stores as equal cost is because aarch64 asked it to :-)

I tried a quick check and it does fix the testcase but not the benchmark. which
is not entirely unexpected thinking about it because x86 does correctly model
the
store costs.

I can try fixing the costs correctly and try reducing again.  It looks like it
still
thinks spilling to memory is cheaper than caller saves reloads.

Reply via email to