Hi Kyrill, > I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH > hooks like x86 does for bdver1-4. > That would try to exploit the dispatch constraints information in the SWOGs > rather than the instruction latency and throughput tables. > That would still require some annotation of SVE patterns but it is > conceptually different metadata that we’d specify in the MD files.
Yes, trying to schedule for dispatch is likely better than traditional scheduling on wide OoO pipelines. Also reducing register pressure in complex blocks may be useful as a separate pass (without all the complexities of scheduling for a CPU model). > Yeah, I’m okay with disabling the early scheduling (is 33% the worst-case > scenario though? > It feels that if it was really taking that much in most code it would have > appeared in bugzilla as a compile-time hog) No, it's the average when building SPECINT, so this includes linking and file IO overheads... >> What do you think about disabling late scheduling as well? > > I think this would definitely need separate consideration and evaluation > given the above. > > Another thing to consider is the macro fusion machinery. IIRC it works during > scheduling so if we don’t run any scheduling we don’t get an opportunity to > bring those instructions together? > > That said, I’m not sure the scheduling actually tries to bring macro fused > instructions together rather than simply avoiding moving them apart. I will run the numbers, but if useful, late scheduling could be disabled separately from fusion scheduling. However fusion really shouldn't be done as a scheduling hack - we should use fused RTL patterns like we do for GOT accesses and AES fusion. Cheers, Wilco