[Public]

Hi,

Thank you for reviewing the patch.

> Hi. I'm still waiting for feedback on fixes for existing models:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.
> sourceware.org%2Fgcc-patches%2F5ae6fc21-edc6-133-aee2-
> a41e16eb5b7%40ispras.ru%2FT%2F%23t&data=05%7C01%7CTejasSanja
> y.Joshi%40amd.com%7C5e440454f42948dd6b2e08dac6714448%7C3dd8961fe
> 4884e608e11a82d994e183d%7C0%7C0%7C638040487038011623%7CUnknow
> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iWNT2VRhEHxgpbq
> Y4dNYjuzdvz%2BaV5XkLTuAegjj%2B5Q%3D&reserved=0
> did you have a chance to look at those?

I am yet to evaluate that patch, I will soon revert back.

> Why are you modeling 'fdiv' and 'ssediv' separately? When preparing the
> above patches, I checked that x87 and SSE divisions use the same hardware
> unit, and I don't see a strong reason to artificially clone it in the model.

I thought of modelling them separately as they are different ISA groups.
But yes, since they execute in the same unit, we can model them in the same 
automaton.

> I have a question on AVX512 modeling in your patch:
> 
> > +;; AVX instructions
> > +(define_insn_reservation "znver4_sse_log" 1
> > +                      (and (eq_attr "cpu" "znver4")
> > +                           (and (eq_attr "type" "sselog,sselog1")
> > +                                (and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF")
> > +                                 (eq_attr "memory" "none"))))
> > +                      "znver4-direct,znver4-fpu")
> > +
> > +(define_insn_reservation "znver4_sse_log_evex" 1
> > +                      (and (eq_attr "cpu" "znver4")
> > +                           (and (eq_attr "type" "sselog,sselog1")
> > +                                (and (eq_attr "mode" "V16SF,V8DF")
> > +                                 (eq_attr "memory" "none"))))
> > +
> > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")
> > +
> 
> This is an AVX512 instruction, and you're modeling that it occupies two ports
> at once and thus has half throughput, but later in the AVX512 section:
> 
> > +;; AVX512 instructions
> > +(define_insn_reservation "znver4_sse_mul_evex" 3
> > +                      (and (eq_attr "cpu" "znver4")
> > +                           (and (eq_attr "type" "ssemul")
> > +                                (and (eq_attr "mode" "V16SF,V8DF")
> > +                                 (eq_attr "memory" "none"))))
> > +                      "znver4-double,znver4-fpu0|znver4-fpu3")
> 
> none of the instructions are modeled this way. If that's on purpose, can you
> add a comment? It's surprising, since generally AVX512 has half throughput
> compared to AVX256 on Zen 4, but the model doesn't seem to reflect that.

> > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")

AVX512 instructions (512-bitwide) occupy 2 consecutive cycles in the pipes they 
execute. So, it should be modelled as shown below:

(define_insn_reservation "znver4_sse_log_evex" 1
                         (and (eq_attr "cpu" "znver4")
                              (and (eq_attr "type" "sselog")
                                   (and (eq_attr "mode" "V16SF,V8DF,XI")
                                    (eq_attr "memory" "none"))))
                         "znver4-double,(znver4-fpu)*2")

(define_insn_reservation "znver4_sse_mul_evex" 3
                         (and (eq_attr "cpu" "znver4")
                              (and (eq_attr "type" "ssemul")
                                   (and (eq_attr "mode" "V16SF,V8DF")
                                    (eq_attr "memory" "none"))))
                         "znver4-double,(znver4-fpu0|znver4-fpu1)*2")
Doing this way increased the insn-automata.cc size from 201402 lines to 212189. 
Hope it is a tolerable increase or do you have any suggestions? I will revise 
all avx512 instructions and post it.

Thanks and Regards,
Tejas

Reply via email to