------- Comment #4 from pinskia at gcc dot gnu dot org 2007-03-22 01:38 ------- Inline version: r.dst[0].i = MEM[base: d]; D.6423 = r.dst[0].i; D.6449 = __builtin_ia32_paddusb128 (VIEW_CONVERT_EXPR<__v16qi>(D.6423), VIEW_CONVERT_EXPR<__v16qi>(D.6423)); r.dst[0].i = VIEW_CONVERT_EXPR<__m128i>(D.6449); __builtin_ia32_movntdq ((__m128i *) d, r.dst[0].i); d = d + 16B;
macro: D.6414 = MEM[base: d]; D.6435 = __builtin_ia32_paddusb128 (VIEW_CONVERT_EXPR<__v16qi>(D.6414), VIEW_CONVERT_EXPR<__v16qi>(D.6414)); __builtin_ia32_movntdq ((__m128i *) d, VIEW_CONVERT_EXPR<__m128i>(D.6435)); d = d + 16B; So somehow r.dst[0].i is not being optimized correctly, I did not look into why really. -- pinskia at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- GCC build triplet|x86_64-redhat-linux | GCC host triplet|x86_64-redhat-linux | Keywords| |missed-optimization http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307