On 9 February 2012 14:51, James Courtier-Dutton <james.dut...@gmail.com> wrote: > 2012/2/9 Andrew Haley <a...@redhat.com>: >> On 02/09/2012 01:38 PM, Tim Prince wrote: >>> x87 built-ins should be a fair compromise between speed, code size, and >>> accuracy, for long double, on most CPUs. As Richard says, it's >>> certainly possible to do better in the context of SSE, but gcc doesn't >>> know anything about the quality of math libraries present; it doesn't >>> even take into account whether it's glibc or something else. >> >> Yeah. On x86_64 glibc, we should really get the sincos bg fixed. And >> it really is a bug: cos() and sin() are correctly rounded but sincos() >> isn't. >> >> Andrew. > > Given: > > gcc -g -O0 -c -o sincos1.o sincos1.c > gcc -static -g -o sincos1 sincos1.o -lm > > ./sincos1 > sin = -8.52200849767188795e-01 (uses xmm register intructions) > sinl = 0.46261304076460176 (uses fprem and fsin) > sincos = 4.62613040764601746e-01 (uses fprem and fsin) > sincosl = 0.46261304076460176 (uses fprem and fsin) > > It looks to me that only sin() is evaluation correctly using xmm > register instructions. > sinl, sincos, and sincosl all use fprem and fsin and fsincos. > > From my analysis, fprem is causing a lot of the problems. > Using a combination of xmm register instructions instead of fprem > might cure our problems with fsincos. > Alternatively, make both sin() and sinl() use xmm register intructions.
Results for x86_64 gcc -g -O0 -c -o sincos1.o sincos1.c gcc -static -g -o sincos1 sincos1.o -lm ./sincos1 sin = -8.52200849767188795e-01 (uses xmm register intructions) sinl = 0.46261304076460176 (uses fprem and fsin) sincos = 4.62613040764601746e-01 (uses fprem and fsin) sincosl = 0.46261304076460176 (uses fprem and fsin) Only sin() gets an accurate answer. Results when compiled for 32bit x86. gcc -m32 -g -O0 -c -o sincos1.o sincos1.c gcc -m32 -static -g -o sincos1 sincos1.o -lm ./sincos1 sin = 4.62613040764601746e-01 sinl = 0.46261304076460176 sincos = 4.62613040764601746e-01 sincosl = 0.46261304076460176 Which are all inaccurate. So, we have a case of the same program compiled for 32bit might give different floating point results than the same source compiled for 64bit. From what I can tell, the xmm register instructions on x86_64 are using 128bit precision which probably explains why the result is more accurate.