Le 11/05/13 11:30, Marc Glisse a écrit :
On Sat, 11 May 2013, jacob navia wrote:
Hi
When caculating the cos/sinus, gcc generates a call to a complicated
routine that takes several thousand instructions to execute.
Suppose the value is stored in some XMM register, say xmm0 and the
result should be in another xmm register, say xmm1.
Why it doesn't generate:
movsd %xmm0,(%rsp)
fldl (%rsp)
fsin
fstpl (%rsp)
movsd (%rsp),%xmm1
My compiler system (lcc-win) is generating that when optimizations
are ON. Maybe there are some flags in gcc that I am missing?
Òr there is some other reason?
fsin is slower and less precise than the libc SSE2 implementation.
Excuse me but:
1) The fsin instruction is ONE instruction! The sin routine is (at
least) thousand instructions!
Even if the fsin instruction itself is "slow" it should be thousand
times faster than the
complicated routine gcc calls.
2) The FPU is at 64 bits mantissa using gcc, i.e. fsin will calculate
with 64 bits mantissa and
NOT only 53 as SSE2. The fsin instruction is more precise!
I think that gcc has a problem here. I am pointing you to this problem,
but please keep in mind
I am no newbee...
jacob