This is a target bug as it does not effect any reasonable processor. With -mfpmath=sse -msse2 I get: .L2: decl %eax addsd %xmm1, %xmm0 jne .L2
my example was about version 3.4.4, which still has this problem with sse options:
.L5: movsd -8(%ebp), %xmm1 decl %eax addsd %xmm0, %xmm1 movsd %xmm1, -8(%ebp) jns .L5
you're right with 4.0 about my example. but the testcase by benjamin still has this problem, with 4.0, with sse:
the inner loop:
.L126: incl %eax movsd -8(%edx), %xmm0 movsd (%edx), %xmm1 addl $8, %edx cmpl $1000, %eax mulsd %xmm0, %xmm1 addsd %xmm1, %xmm0 addsd -48(%ebp), %xmm0 movsd %xmm0, -48(%ebp) jne .L126
inner loop with one of the changes benjamin suggested, which shouldn't have any effect:
.L124: incl %eax movsd -8(%edx), %xmm0 movsd (%edx), %xmm1 addl $8, %edx cmpl $1000, %eax mulsd %xmm0, %xmm1 addsd %xmm1, %xmm0 addsd %xmm0, %xmm2 jne .L124
-- Stefan Strasser