https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116582

--- Comment #5 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> (In reply to Jan Hubicka from comment #3)
> > Just for completeness the codegen for parest sparse matrix multiply is:
> > 
> >   0.31 │320:   kmovb         %k1,%k4
> >   0.25 │       kmovb         %k1,%k5
> >   0.28 │       vmovdqu32     (%rcx,%rax,1),%zmm0
> >   0.32 │       vpmovzxdq     %ymm0,%zmm4
> >   0.31 │       vextracti32x8 $0x1,%zmm0,%ymm0
> >   0.48 │       vpmovzxdq     %ymm0,%zmm0
> >  10.32 │       vgatherqpd    (%r14,%zmm4,8),%zmm2{%k4}
> >   1.90 │       vfmadd231pd   (%rdx,%rax,2),%zmm2,%zmm1
> >  14.86 │       vgatherqpd    (%r14,%zmm0,8),%zmm5{%k5}   
> >   0.27 │       vfmadd231pd   0x40(%rdx,%rax,2),%zmm5,%zmm1    
> >   0.26 │       add           $0x40,%rax
> >   0.23 │       cmp           %rax,%rdi                   
> >        │     ↑ jne           320                         
> > 
> > which looks OK to me.
> 
> The in-loop mask moves are odd, but yes.
> 
>
It's because vgatherqpd will set k4 to 0, and it needs to be reinitialized to
-1(%k1)

Reply via email to