With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
like
double d = atof(foo);
int i = d;
call atof
fstpl -8(%ebp)
movsd -8(%ebp), %xmm0
cvttsd2si %xmm0, %eax
(This is Linux, Darwin is similar.) I think the difficulty is that for
(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}
regclass decides SSE_REGS is a zero-cost choice for 58. Which looks
wrong, as that requires a store and load from memory. In fact, memory
is
the cheapest overall choice for 58 (taking its use into account also),
and
gcc will figure that out correctly if a more reasonable assessment is
given
to SSE_REGS. The immediate cause is the #Y's in the constraint:
"=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m
"
and there's probably a simple fix, but it eludes me. Advice? Thanks.