On 2012-07-26 13:33:46 +0900, Miles Bader wrote: > Vincent Lefevre <vinc...@vinc17.net> writes: > > I think that there could be an optimization like that in > > fesetround() too. > > Do you think it's worth proposing this to the glibc people?
Yes, since this makes the code much faster on some processors, I think it is. Then they'll have to decide what to do depending on the processor (in particular on non-x86). I've attached a new test program. It is also available here: http://www.vinc17.net/software/rndmode.c It shows that this change would be useful on all the processors I've tested: AMD Opteron, Intel Xeon, Intel Core2, POWER7. It would be particularly important on Intel Core2 and, in a less extent, AMD Opteron. Summary of the results: AMD Opteron 4.62s 8.92s 4.72s Intel Xeon X5650 3.22s 4.69s 3.51s Intel Xeon E5520 3.37s 5.20s 3.66s Intel Core2 Duo P8600 3.35s 11.77s 3.70s POWER7 7.29s 11.16s 7.86s 1st timing: no calls to fegetround/fesetround. 2nd timing: fegetround/fesetround/fesetround to set and restore the RM. 3rd timing: fegetround with tests before fesetround, so that fesetround doesn't need to be called. This shows that the rounding mode test could be done in fesetround. When the rounding mode really changes, this would just be a little slower. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
/* $Id: rndmode.c 53564 2012-07-26 11:10:38Z vinc17/ypig $ Compare the timings with OPT = 0, 1 and 2 to measure the fegetround/fesetround performance when the rounding mode doesn't change. See thread: http://lists.debian.org/debian-devel/2012/07/msg00466.html and in particular: http://lists.debian.org/debian-devel/2012/07/msg00747.html Possible test: for opt in 0 1 2 do gcc -O2 rndmode.c -o rndmode -lm -DOPT=$opt echo "OPT=$opt" for i in 0 1 2; do time ./rndmode; done done On a Debian/squeeze x86_64 Quad-Core AMD Opteron 8378 @ 2.40GHz machine with libc6 2.11.3-3 and GCC 4.4.5: 4.62s / 8.92s / 4.72s On a Debian/squeeze x86_64 Intel Xeon X5650 @ 2.67GHz machine with libc6 2.11.3-3 and GCC 4.4.5: 3.22s / 4.69s / 3.51s On a Debian/unstable x86_64 Intel Xeon E5520 @ 2.27GHz machine with libc6 2.13-35 and GCC 4.7.1: 3.37s / 5.20s / 3.66s On a Debian/unstable x86_64 Intel Core2 Duo P8600 @ 2.40GHz machine with libc6 2.13-35 and GCC 4.7.1: 3.35s / 11.77s / 3.70s On a Red Hat Fedora release 16 (Verne) POWER7 @ 3.55GHz machine with glibc 2.14.90-24 and GCC 4.6.3: 7.29s / 11.16s / 7.86s */ #include <stdio.h> #include <math.h> #include <fenv.h> #pragma STDC FENV_ACCESS ON #ifndef N #define N 100000000 #endif int main (void) { volatile double x = 1.0, y = 0.0; int i; for (i = 0; i < N; i++) { #if OPT int r = fegetround(); #if OPT > 1 if (r != FE_TONEAREST) #endif fesetround (FE_TONEAREST); #endif y += exp(x); #if OPT #if OPT > 1 if (r != FE_TONEAREST) #endif fesetround (r); #endif } return y == 0.0; }