On 10/30/24 9:31 AM, Richard Sandiford wrote:
OK (and yeah, I can sympathise). But I think there's an argument that,
if you're scheduling for one in-order core using the pipeline of an
unrelated core, that's effectively scheduling for the core as though
it were out-of-order. In other words, the property we care about
isn't so much whether the processor itself is in-order (a statement
about the uarch), but whether we trying to schedule for a particular
in-order pipeline (a statement about what GCC is doing or knows about).
I'd argue that in the case you describe, we're not trying to schedule
for a particular in-order pipeline.
I can see that point.
That might need some finessing of the name. But I think the concept
is right. I'd rather base the hook (or param) on a general concept
like that rather than a specific "wide vs narrow" thing.
Agreed. Naming was my real only concern about the first patch.
I still see Vineet's data as compelling, even with GIGO concern.
Do you mean the reduction in dynamic instruction counts? If so,
that isn't what the algorithm is aiming to reduce. Like I mentioned
in the previous thread, trying to minimise dynamic instruction counts
was also harmful for the core & benchmarks I was looking at.
We just ended up with lots of pipeline bubbles that could be
alleviated by judicious spilling.
Vineet showed significant cycle and icount improvements. I'm much more
interested in the former :-)
I'm planning to run it on our internal design, but it's not the top of
the priority list and it's a scarce resource right now... I fully
expect it'll show a cycle improvement there too, though probably much
smaller than the improvement seen on that spacemit k1 design.
I'm not saying that the algorithm gets the decision right for cactu
when tuning for in-order CPU X and running on that same CPU X.
But it seems like that combination hasn't been tried, and that,
even on the combinations that the patch has been tried on, the cactu
justification is based on static properties of the binary rather than
a particular runtime improvement (Y% faster).
To be clear, the two paragraphs above are trying to explain why I think
this should be behind a hook or param rather than unconditional. The
changes themselves look fine, and incorporate the suggestions from the
previous thread (thanks!).
Thanks for that clarifying statement. I actually think we're broadly in
agreement here -- keep it as a hook/param rather than making it
unconditional.
Assuming we keep it as a hook/param, opt-in & come up with better
name/docs, any objections from your side?
jeff