http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2013-01-04 Ever Confirmed|0 |1 --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2013-01-04 03:38:40 UTC --- Confirmed here is a more reduced testcase: extern double p2[]; extern double ck[]; int chk_pd(void); int sse3_test (void) { int i = 0; int fail = 0; __m128d t1 = (__m128d){*p2, 0}; __m128d t2 = __builtin_ia32_shufpd (t1, t1, 0); double p10 = p2[0]; for (; i < 80; i += 1) { ck[0] = p10; __builtin_ia32_storeupd (p2, t2); fail += chk_pd (); } } --- CUT --- Note the first difference with -fno-expensive-optimizations is the ira dump. Also note if we change t1/t2 into: __m128d t2 = (__m128d){*p2, *p2}; It works. The difference between those two are: (insn 17 13 7 2 (set (reg/v:V2DF 65 [ t2 ]) (vec_concat:V2DF (reg:DF 80 [ D.1764 ]) (reg:DF 80 [ D.1764 ]))) t6.c:11 1467 {*vec_concatv2df} (nil)) (insn 10 9 5 2 (set (reg/v:V2DF 63 [ t2 ]) (vec_duplicate:V2DF (reg:DF 62 [ D.1756 ]))) t6.c:9 1466 {vec_dupv2df} (nil)) Note both of those two RTL are the exactly the same, maybe we should convert the vec_concat of the same value into vec_duplicate but that is a different issue all together and would make this ICE latent.