Re: [PATCH v2] Target-independent store forwarding avoidance.

Jeff Law Tue, 11 Jun 2024 06:38:08 -0700



On 6/11/24 1:22 AM, Richard Biener wrote:

Absolutely.   But forwarding from a smaller store to a wider load is painful
from a hardware standpoint and if we can avoid it from a codegen standpoint,
we should.


Note there's also the possibility to increase the distance between the
store and the load - in fact the time a store takes to a) retire and
b) get from the store buffers to where the load-store unit would pick it
up (L1-D) is another target specific tuning knob.  That said, if that
distance isn't too large (on x86 there might be only an upper bound
given by the OOO window size and the L1D store latency(?), possibly
also additionally by the store buffer size) attacking the issue in
sched1 or sched2 might be another possibility.  So I think pass placement
is another thing to look at - I'd definitely place it after sched1
but I guess without looking at the pass again it's way before that?

True, but I doubt there are enough instructions we could sink the loadpast to make a measurable difference. This is especially true on theclass of uarchs where this is going to be most important.

In the case where the store/load can't be interchanged and thus this newpass rejects any transformation, we could try to do something in thescheduler to defer the load as long as possible. Essentially it's atrue dependency through a memory location using must-aliasing propertiesand in that case we'd want to crank up the "latency" of the store sothat the load gets pushed away.

I think one of the difficulties here is we often model stores as nothaving any latency (which is probably OK in most cases). Input datadependencies and structural hazards dominate dominate considerations forstores.


jeff

Re: [PATCH v2] Target-independent store forwarding avoidance.

Reply via email to