Vineet Gupta <vine...@rivosinc.com> writes:
> On 10/30/24 10:25, Jeff Law wrote:
>> On 10/30/24 9:31 AM, Richard Sandiford wrote:
>>> That might need some finessing of the name.  But I think the concept
>>> is right.  I'd rather base the hook (or param) on a general concept
>>> like that rather than a specific "wide vs narrow" thing.
>> Agreed.  Naming was my real only concern about the first patch.
>
> We are leaning towards
>   - TARGET_SCHED_PRESSURE_SPILL_AGGRESSIVE
>   - targetm.sched.pressure_spill_aggressive
>
> Targets could wire them up however they like
>
>>>> I still see Vineet's data as compelling, even with GIGO concern.
>>> Do you mean the reduction in dynamic instruction counts?  If so,
>>> that isn't what the algorithm is aiming to reduce.  Like I mentioned
>>> in the previous thread, trying to minimise dynamic instruction counts
>>> was also harmful for the core & benchmarks I was looking at.
>>> We just ended up with lots of pipeline bubbles that could be
>>> alleviated by judicious spilling.
>> Vineet showed significant cycle and icount improvements.  I'm much more 
>> interested in the former :-)
>
> The initial premise indeed was icounts but with recent access to some 
> credible hardware I'm all for perf measurement now.
>
> Please look at patch 2/4 [1] for actual perf data both cycles and 
> instructions.
> I kept 1/4 introducing hook seperate from 2/4 which implements the hook for 
> RISC-V.
>
>     [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665945.html

Ah, sorry, I was indeed going only from the description in 1/4.
I've not had time to look at the rest of the series yet.

> As Jeff mentioned on a In-order RISC-V core, are we are seeing 6% cycle 
> improvements from the hook and another 6% cycles improvement from patch 3/4

Sounds good!

> Also Wilco gave this a spin on high end OoO Neoverse and seems to be seeing 
> 20% improvement which I gather is cycles.

Yeah, it's common ground that we should change this for OoO cores.

>>> I'm not saying that the algorithm gets the decision right for cactu
>>> when tuning for in-order CPU X and running on that same CPU X.
>>> But it seems like that combination hasn't been tried, and that,
>>> even on the combinations that the patch has been tried on, the cactu
>>> justification is based on static properties of the binary rather than
>>> a particular runtime improvement (Y% faster).
>
> I'd requested Wilco to possibly try this on some in-order arm cores.

OK.  FWIW, I think the original testing was on Cortex-A9 or Cortex-A15,
It was also heavy on filters, such as yiq.

But is this about making the argument in favour of an unconditional change?
If so, I don't think it's necessary to front-load this testing.  Like I said
in my reply to Jeff, that can happen naturally if all major targets move
to the new behaviour.  And for a hook/param approach, we already have
enough data to justify the patch.

Thanks,
Richard

Reply via email to