[Public] Hi,
Thank you for reviewing the patch. > Hi. I'm still waiting for feedback on fixes for existing models: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox. > sourceware.org%2Fgcc-patches%2F5ae6fc21-edc6-133-aee2- > a41e16eb5b7%40ispras.ru%2FT%2F%23t&data=05%7C01%7CTejasSanja > y.Joshi%40amd.com%7C5e440454f42948dd6b2e08dac6714448%7C3dd8961fe > 4884e608e11a82d994e183d%7C0%7C0%7C638040487038011623%7CUnknow > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha > WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iWNT2VRhEHxgpbq > Y4dNYjuzdvz%2BaV5XkLTuAegjj%2B5Q%3D&reserved=0 > did you have a chance to look at those? I am yet to evaluate that patch, I will soon revert back. > Why are you modeling 'fdiv' and 'ssediv' separately? When preparing the > above patches, I checked that x87 and SSE divisions use the same hardware > unit, and I don't see a strong reason to artificially clone it in the model. I thought of modelling them separately as they are different ISA groups. But yes, since they execute in the same unit, we can model them in the same automaton. > I have a question on AVX512 modeling in your patch: > > > +;; AVX instructions > > +(define_insn_reservation "znver4_sse_log" 1 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "sselog,sselog1") > > + (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF") > > + (eq_attr "memory" "none")))) > > + "znver4-direct,znver4-fpu") > > + > > +(define_insn_reservation "znver4_sse_log_evex" 1 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "sselog,sselog1") > > + (and (eq_attr "mode" "V16SF,V8DF") > > + (eq_attr "memory" "none")))) > > + > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") > > + > > This is an AVX512 instruction, and you're modeling that it occupies two ports > at once and thus has half throughput, but later in the AVX512 section: > > > +;; AVX512 instructions > > +(define_insn_reservation "znver4_sse_mul_evex" 3 > > + (and (eq_attr "cpu" "znver4") > > + (and (eq_attr "type" "ssemul") > > + (and (eq_attr "mode" "V16SF,V8DF") > > + (eq_attr "memory" "none")))) > > + "znver4-double,znver4-fpu0|znver4-fpu3") > > none of the instructions are modeled this way. If that's on purpose, can you > add a comment? It's surprising, since generally AVX512 has half throughput > compared to AVX256 on Zen 4, but the model doesn't seem to reflect that. > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3") AVX512 instructions (512-bitwide) occupy 2 consecutive cycles in the pipes they execute. So, it should be modelled as shown below: (define_insn_reservation "znver4_sse_log_evex" 1 (and (eq_attr "cpu" "znver4") (and (eq_attr "type" "sselog") (and (eq_attr "mode" "V16SF,V8DF,XI") (eq_attr "memory" "none")))) "znver4-double,(znver4-fpu)*2") (define_insn_reservation "znver4_sse_mul_evex" 3 (and (eq_attr "cpu" "znver4") (and (eq_attr "type" "ssemul") (and (eq_attr "mode" "V16SF,V8DF") (eq_attr "memory" "none")))) "znver4-double,(znver4-fpu0|znver4-fpu1)*2") Doing this way increased the insn-automata.cc size from 201402 lines to 212189. Hope it is a tolerable increase or do you have any suggestions? I will revise all avx512 instructions and post it. Thanks and Regards, Tejas