------- Comment #4 from ubizjak at gmail dot com 2010-08-27 09:26 ------- Created an attachment (id=21576) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view) Patch to remove special (vec_duplicate ...) insn RTXes
This patch removes special (vec_duplicate ...) forms of zero/sign extension instructions. This is similar to existing sse2_cvtps2pd pattern that access full 128bit memory even if only low 64bits are used. Also, current gcc generates: movdqa (%rdi), %xmm0 # 6 *movv16qi_internal/2 [length = 4] pmovzxbd %xmm0, %xmm0 # 7 sse4_1_zero_extendv4qiv4si2 which also access full 128bit 16byte aligned value. This is no better than: pmovzxbd (%rdi), %xmm0 # 7 sse4_1_zero_extendv4qiv4si2 Patch is untested, since I don't have required HW. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484