Jeff Law <jeffreya...@gmail.com> writes: > On 10/30/24 9:31 AM, Richard Sandiford wrote: > >> >> OK (and yeah, I can sympathise). But I think there's an argument that, >> if you're scheduling for one in-order core using the pipeline of an >> unrelated core, that's effectively scheduling for the core as though >> it were out-of-order. In other words, the property we care about >> isn't so much whether the processor itself is in-order (a statement >> about the uarch), but whether we trying to schedule for a particular >> in-order pipeline (a statement about what GCC is doing or knows about). >> I'd argue that in the case you describe, we're not trying to schedule >> for a particular in-order pipeline. > I can see that point. > >> >> That might need some finessing of the name. But I think the concept >> is right. I'd rather base the hook (or param) on a general concept >> like that rather than a specific "wide vs narrow" thing. > Agreed. Naming was my real only concern about the first patch. > >> >>> I still see Vineet's data as compelling, even with GIGO concern. >> >> Do you mean the reduction in dynamic instruction counts? If so, >> that isn't what the algorithm is aiming to reduce. Like I mentioned >> in the previous thread, trying to minimise dynamic instruction counts >> was also harmful for the core & benchmarks I was looking at. >> We just ended up with lots of pipeline bubbles that could be >> alleviated by judicious spilling. > Vineet showed significant cycle and icount improvements. I'm much more > interested in the former :-) > > I'm planning to run it on our internal design, but it's not the top of > the priority list and it's a scarce resource right now... I fully > expect it'll show a cycle improvement there too, though probably much > smaller than the improvement seen on that spacemit k1 design. > >> >> I'm not saying that the algorithm gets the decision right for cactu >> when tuning for in-order CPU X and running on that same CPU X. >> But it seems like that combination hasn't been tried, and that, >> even on the combinations that the patch has been tried on, the cactu >> justification is based on static properties of the binary rather than >> a particular runtime improvement (Y% faster). >> >> To be clear, the two paragraphs above are trying to explain why I think >> this should be behind a hook or param rather than unconditional. The >> changes themselves look fine, and incorporate the suggestions from the >> previous thread (thanks!). > Thanks for that clarifying statement. I actually think we're broadly in > agreement here -- keep it as a hook/param rather than making it > unconditional.
Yeah, agreed. > Assuming we keep it as a hook/param, opt-in & come up with better > name/docs, any objections from your side? No, seems fine to me. I'm kind-of leaning towards a --param. The hook definition would logically be determined by -mtune (at least on targets like aarch32 that do have meaningful in-order scheduling descriptions -- I can imagine that for aarch64 we'd set it unconditionally). But that wouldn't capture the case above, where you're tuning for a different core from the one that will actually be used. How about: --param cycle-accurate-model but with the description: Whether GCC should assume that the scheduling description is mostly a cycle-accurate model of the target processor, in the absence of cache misses. Nonzero usually means that the selected scheduling model describes an in-order processor, that the scheduling model accurately predicts pipeline bubbles in the absence of cache misses, and that GCC should assume that the scheduling model matches the target that the code is intended to run on. (with better word-smithing)? I suppose it should initially default to 1, but we could flip that later if all major targets set it to 0. (Or we could take that as proof that the old approach isn't needed and just remove the --param.) A param would also be cheaper to test. Thanks, Richard