http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
Yuri Rumyantsev <ysrumyan at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ysrumyan at gmail dot com --- Comment #9 from Yuri Rumyantsev <ysrumyan at gmail dot com> --- Uros, I assume that this fix is not good and must be reverted - I will prepare another fix for your reviewing. There are at least 2 problems: 1. New split for int --> fp converisons is done under TARGET_SSE2 and TARGET_SSE_PARTIAL_REG_DEPENDENCY which include both Atom chips - SLT and SLM. I checked that zeroing of xmm register before conversion leads to performance slowdown on SLM (-5%) for proveded test-case. I assume that TARGET_AVX must be used instead of TARGET_SSE2. 2. This zeroing must redundant and should not be inserted, e.g. for the following simple test-case: void foo (float* p, int n) { int i; for (i=0; i<n; i++) p[i] = (float) i; } with H.J patch we got the following assembly (I compiled it for slm but it does not matter): .L3: xorps %xmm0, %xmm0 cvtsi2ss %eax, %xmm0 movss %xmm0, (%ecx,%eax,4) addl $1, %eax cmpl %edx, %eax jne .L3 It is clear that zeroing is redundant for it and must be deleted.