http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Yuri Rumyantsev from comment #9)

> I assume that this fix is not good and must be reverted - I will prepare
> another fix for your reviewing. There are at least 2 problems:
> 
> 1. New split for int --> fp converisons is done under TARGET_SSE2 and
> TARGET_SSE_PARTIAL_REG_DEPENDENCY which include both Atom chips - SLT and
> SLM.
> I checked that zeroing of xmm register before conversion leads to
> performance slowdown on SLM (-5%) for proveded test-case. I assume that
> TARGET_AVX must be used instead of TARGET_SSE2.

The patch is effective for my target (IvyBridge), but I see no problem to
fine-tune the split condition for other targets. Perhaps Atom should be taken
out od TARGET_SSE_PARTIAL_REG_DEPENDENCY ?

> 2. This zeroing must redundant and should not be inserted, e.g. for the
> following simple test-case:
> 
> void foo (float* p, int n)
> {
>   int i;
>   for (i=0; i<n; i++)
>     p[i] = (float) i;
> }
> 
> with H.J patch we got the following assembly (I compiled it for slm but it
> does not matter):
> 
> .L3:
>       xorps   %xmm0, %xmm0
>       cvtsi2ss        %eax, %xmm0
>       movss   %xmm0, (%ecx,%eax,4)
>       addl    $1, %eax
>       cmpl    %edx, %eax
>       jne     .L3
> 
> It is clear that zeroing is redundant for it and must be deleted.

Hm, it is not that clear. If the stall is happening in cvtsi2ss, then following
movss shouldn't matter, or at least it shouldn't make things any worse. Of
course, you have much more information at hand, so instead of the patch revert
(the patch *is* effective for certain targets), I suggest to submit a follow-up
patch that fine-tunes the split condition.

Reply via email to