https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62080
--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> --- (note that a minimal, self-contained testcase would be much better and shouldn't be hard to produce) We write to memory with: (insn 10 8 11 2 (set (mem:V2DI (reg/v/f:DI 97 [ vec ]) [0 MEM[(__m128i * {ref-all})vec_4(D)]+0 S16 A128]) (subreg:V2DI (reg:V4SI 98) 0)) /usr/lib/gcc-snapshot/lib/gcc/x86_64-linux-gnu/4.10.0/include/emmintrin.h:706 1147 {*movv2di_internal} (expr_list:REG_DEAD (reg:V4SI 98) (nil))) and then read back with: (insn 15 12 17 2 (set (reg:V2DF 100) (vec_concat:V2DF (mem:DF (reg/v/f:DI 97 [ vec ]) [5 MEM[(const double *)vec_4(D)]+0 S8 A64]) (mem:DF (plus:DI (reg/v/f:DI 97 [ vec ]) (const_int 8 [0x8])) [0 S8 A8]))) /usr/lib/gcc-snapshot/lib/gcc/x86_64-linux-gnu/4.10.0/include/emmintrin.h:925 2016 {*vec_concatv2df} (nil)) The vec_concat of the 2 adjacent memory locations is not merged into a single memory read, although from the previous insn it looks like it is suitably aligned.