On 12/12/2011 06:05 PM, Sriraman Tallam wrote: > On core2, unaligned vector load/store using movdqu is a very slow operation. > Experiments show it is six times slower than movdqa (aligned) and this is > irrespective of whether the resulting data happens to be aligned or not. > For Corei7, there is no performance difference between the two and on AMDs, > movdqu is only about 10% slower. > > This patch does not vectorize loops that need to generate the slow unaligned > memory load/stores on core2.
What happens if you temporarily disable /* ??? Similar to above, only less clear because of quote typeless stores unquote. */ if (TARGET_SSE2 && !TARGET_SSE_TYPELESS_STORES && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) { op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); emit_insn (gen_sse2_movdqu (op0, op1)); return; } so that the unaligned store happens via movlps + movhps? r~