https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95859

--- Comment #11 from Tobias Schlüter <tobi at gcc dot gnu.org> ---
Works on trunk now but not 10.2.  Compiler explorer link:
https://godbolt.org/z/1zbh4YM4W

On the trunk we get the following.  I'm guessing that one could enhance the
read pattern by using more registers, but without benchmarking I don't believe
that this can be beat: 
func34(m34):
        pxor    xmm0, xmm0
        mov     rax, rdi
        cvtss2sd        xmm0, DWORD PTR [rsp+8]
        movsd   QWORD PTR [rdi], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+24]
        movsd   QWORD PTR [rdi+8], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+40]
        movsd   QWORD PTR [rdi+16], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+12]
        movsd   QWORD PTR [rdi+24], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+28]
        movsd   QWORD PTR [rdi+32], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+44]
        movsd   QWORD PTR [rdi+40], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+16]
        movsd   QWORD PTR [rdi+48], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+32]
        movsd   QWORD PTR [rdi+56], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+48]
        movsd   QWORD PTR [rdi+64], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+20]
        movsd   QWORD PTR [rdi+72], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+36]
        movsd   QWORD PTR [rdi+80], xmm0
        pxor    xmm0, xmm0
        cvtss2sd        xmm0, DWORD PTR [rsp+52]
        movsd   QWORD PTR [rdi+88], xmm0
        ret
Thanks to whoever did that.

I see that a release candidate for 10.2.1 has been cut.  I would assume that
it's not fixed in 10.2.1 because there would be a bugfix mentioned here.  My
experience is clearly not representative and I can appreciate that there was no
deluge of performance regression PRs, but I would think that Eigen is an
important enough library that one should consider whether breaking it like this
is really something that should survive several (sub-)releases.

Reply via email to