------- Comment #2 from rguenth at gcc dot gnu dot org 2009-01-13 15:15 ------- Note that your testcase has moved the load _mm_load_ps(in+4); before the store _mm_store_ps(out, result); which the compiler cannot do itself because they may alias.
-- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
