> > On Tue, 3 Jan 2023, Jan Hubicka wrote: > > > > * gcc/common/config/i386/i386-common.cc (processor_alias_table): > > > Use CPU_ZNVER4 for znver4. > > > * config/i386/i386.md: Add znver4.md. > > > * config/i386/znver4.md: New. > > OK, > > thanks! > > Honza, I'm curious what are your further plans for this, you mentioned > merging znver4.md back in znver.md if I recall correctly?
I was looking into that over Christmas (and it was also reason for my first pass through where I was asking for various differences). There are number of small divergences between znver.md and znver4.md that seem to make the merged automaton bigger than having two automatons. So merging both meaningfuly would mean modifying znver1-3 model or znver4 models. With Tejas I think we mostly verified that the areas znver4 modes is different from znver1-3 are correct for znver4 and sometimes also for znver3 (for example the branching unit is present already there but not bodelled). Splitting znver1-3 and 4 is definitly not optimal. However given the time constrains and desire to not break znver1-3 I think going with znver4.md is good option at least for GCC12/13. Overall I am not sure how beneficial the model overall is: since we schedule on BB basis and model CPU as in-order with no register renaming, the scheduler has rarely chance to fill most of execution units and de-facto optimizes for wastly different CPU than reality is). We get noticebale SPEC perfomance boost for -fschedule-insns2 but it seems to be mostly for scheduling for latencies. LLVM's model seems to do more than we do, but comparing both compilers I was not really able to tell if either of them get noticeable benefit from the actual model of reservation units (and not only latencies). I would welcome toughts/ideas/measurements on this. Honza > > Alexander