------- Comment #4 from ubizjak at gmail dot com  2010-08-27 09:26 -------
Created an attachment (id=21576)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view)
Patch to remove special (vec_duplicate ...) insn RTXes

This patch removes special (vec_duplicate ...) forms of zero/sign extension
instructions. This is similar to existing sse2_cvtps2pd pattern that access
full 128bit memory even if only low 64bits are used.

Also, current gcc generates:

        movdqa  (%rdi), %xmm0   # 6     *movv16qi_internal/2    [length = 4]
        pmovzxbd        %xmm0, %xmm0    # 7     sse4_1_zero_extendv4qiv4si2     

which also access full 128bit 16byte aligned value. This is no better than:

        pmovzxbd        (%rdi), %xmm0   # 7     sse4_1_zero_extendv4qiv4si2     

Patch is untested, since I don't have required HW.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484

Reply via email to