------- Comment #5 from hjl dot tools at gmail dot com 2010-08-27 16:16 ------- (In reply to comment #4) > Created an attachment (id=21576) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21576&action=view) [edit] > Patch to remove special (vec_duplicate ...) insn RTXes > > This patch removes special (vec_duplicate ...) forms of zero/sign extension > instructions. This is similar to existing sse2_cvtps2pd pattern that access > full 128bit memory even if only low 64bits are used. > > Also, current gcc generates: > > movdqa (%rdi), %xmm0 # 6 *movv16qi_internal/2 [length = 4] > pmovzxbd %xmm0, %xmm0 # 7 sse4_1_zero_extendv4qiv4si2 > > > which also access full 128bit 16byte aligned value. This is no better than: > > pmovzxbd (%rdi), %xmm0 # 7 sse4_1_zero_extendv4qiv4si2 > > > Patch is untested, since I don't have required HW. >
I tested it on Linux/ia32 and Linux/Intel64 with SSE4.1. There are no regressions. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41484