Vineet Gupta <vine...@rivosinc.com> writes: > On 10/30/24 10:25, Jeff Law wrote: >> On 10/30/24 9:31 AM, Richard Sandiford wrote: >>> That might need some finessing of the name. But I think the concept >>> is right. I'd rather base the hook (or param) on a general concept >>> like that rather than a specific "wide vs narrow" thing. >> Agreed. Naming was my real only concern about the first patch. > > We are leaning towards > - TARGET_SCHED_PRESSURE_SPILL_AGGRESSIVE > - targetm.sched.pressure_spill_aggressive > > Targets could wire them up however they like > >>>> I still see Vineet's data as compelling, even with GIGO concern. >>> Do you mean the reduction in dynamic instruction counts? If so, >>> that isn't what the algorithm is aiming to reduce. Like I mentioned >>> in the previous thread, trying to minimise dynamic instruction counts >>> was also harmful for the core & benchmarks I was looking at. >>> We just ended up with lots of pipeline bubbles that could be >>> alleviated by judicious spilling. >> Vineet showed significant cycle and icount improvements. I'm much more >> interested in the former :-) > > The initial premise indeed was icounts but with recent access to some > credible hardware I'm all for perf measurement now. > > Please look at patch 2/4 [1] for actual perf data both cycles and > instructions. > I kept 1/4 introducing hook seperate from 2/4 which implements the hook for > RISC-V. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665945.html
Ah, sorry, I was indeed going only from the description in 1/4. I've not had time to look at the rest of the series yet. > As Jeff mentioned on a In-order RISC-V core, are we are seeing 6% cycle > improvements from the hook and another 6% cycles improvement from patch 3/4 Sounds good! > Also Wilco gave this a spin on high end OoO Neoverse and seems to be seeing > 20% improvement which I gather is cycles. Yeah, it's common ground that we should change this for OoO cores. >>> I'm not saying that the algorithm gets the decision right for cactu >>> when tuning for in-order CPU X and running on that same CPU X. >>> But it seems like that combination hasn't been tried, and that, >>> even on the combinations that the patch has been tried on, the cactu >>> justification is based on static properties of the binary rather than >>> a particular runtime improvement (Y% faster). > > I'd requested Wilco to possibly try this on some in-order arm cores. OK. FWIW, I think the original testing was on Cortex-A9 or Cortex-A15, It was also heavy on filters, such as yiq. But is this about making the argument in favour of an unconditional change? If so, I don't think it's necessary to front-load this testing. Like I said in my reply to Jeff, that can happen naturally if all major targets move to the new behaviour. And for a hook/param approach, we already have enough data to justify the patch. Thanks, Richard