https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116582

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Just for completeness the codegen for parest sparse matrix multiply is:

  0.31 │320:   kmovb         %k1,%k4
  0.25 │       kmovb         %k1,%k5
  0.28 │       vmovdqu32     (%rcx,%rax,1),%zmm0
  0.32 │       vpmovzxdq     %ymm0,%zmm4
  0.31 │       vextracti32x8 $0x1,%zmm0,%ymm0
  0.48 │       vpmovzxdq     %ymm0,%zmm0
 10.32 │       vgatherqpd    (%r14,%zmm4,8),%zmm2{%k4}
  1.90 │       vfmadd231pd   (%rdx,%rax,2),%zmm2,%zmm1
 14.86 │       vgatherqpd    (%r14,%zmm0,8),%zmm5{%k5}   
  0.27 │       vfmadd231pd   0x40(%rdx,%rax,2),%zmm5,%zmm1    
  0.26 │       add           $0x40,%rax
  0.23 │       cmp           %rax,%rdi                   
       │     ↑ jne           320                         

which looks OK to me.

Reply via email to