On 2012-07-26 13:33:46 +0900, Miles Bader wrote:
> Vincent Lefevre <vinc...@vinc17.net> writes:
> > I think that there could be an optimization like that in
> > fesetround() too.
> 
> Do you think it's worth proposing this to the glibc people?

Yes, since this makes the code much faster on some processors,
I think it is. Then they'll have to decide what to do depending
on the processor (in particular on non-x86).

I've attached a new test program. It is also available here:

  http://www.vinc17.net/software/rndmode.c

It shows that this change would be useful on all the processors
I've tested: AMD Opteron, Intel Xeon, Intel Core2, POWER7. It
would be particularly important on Intel Core2 and, in a less
extent, AMD Opteron.

Summary of the results:

AMD Opteron             4.62s    8.92s   4.72s
Intel Xeon X5650        3.22s    4.69s   3.51s
Intel Xeon E5520        3.37s    5.20s   3.66s
Intel Core2 Duo P8600   3.35s   11.77s   3.70s
POWER7                  7.29s   11.16s   7.86s

1st timing: no calls to fegetround/fesetround.
2nd timing: fegetround/fesetround/fesetround to set and restore the RM.
3rd timing: fegetround with tests before fesetround, so that fesetround
doesn't need to be called.

This shows that the rounding mode test could be done in fesetround.
When the rounding mode really changes, this would just be a little
slower.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
/* $Id: rndmode.c 53564 2012-07-26 11:10:38Z vinc17/ypig $

Compare the timings with OPT = 0, 1 and 2 to measure the
fegetround/fesetround performance when the rounding mode
doesn't change. See thread:
  http://lists.debian.org/debian-devel/2012/07/msg00466.html
and in particular:
  http://lists.debian.org/debian-devel/2012/07/msg00747.html

Possible test:

for opt in 0 1 2
do
  gcc -O2 rndmode.c -o rndmode -lm -DOPT=$opt
  echo "OPT=$opt"
  for i in 0 1 2; do time ./rndmode; done
done

On a Debian/squeeze x86_64 Quad-Core AMD Opteron 8378 @ 2.40GHz machine
with libc6 2.11.3-3 and GCC 4.4.5: 4.62s / 8.92s / 4.72s

On a Debian/squeeze x86_64 Intel Xeon X5650 @ 2.67GHz machine
with libc6 2.11.3-3 and GCC 4.4.5: 3.22s / 4.69s / 3.51s

On a Debian/unstable x86_64 Intel Xeon E5520 @ 2.27GHz machine
with libc6 2.13-35 and GCC 4.7.1: 3.37s / 5.20s / 3.66s

On a Debian/unstable x86_64 Intel Core2 Duo P8600 @ 2.40GHz machine
with libc6 2.13-35 and GCC 4.7.1: 3.35s / 11.77s / 3.70s

On a Red Hat Fedora release 16 (Verne) POWER7 @ 3.55GHz machine
with glibc 2.14.90-24 and GCC 4.6.3: 7.29s / 11.16s / 7.86s
*/

#include <stdio.h>
#include <math.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON

#ifndef N
#define N 100000000
#endif

int main (void)
{
  volatile double x = 1.0, y = 0.0;
  int i;

  for (i = 0; i < N; i++)
    {
#if OPT
      int r = fegetround();
#if OPT > 1
      if (r != FE_TONEAREST)
#endif
        fesetround (FE_TONEAREST);
#endif
      y += exp(x);
#if OPT
#if OPT > 1
      if (r != FE_TONEAREST)
#endif
        fesetround (r);
#endif
    }
  return y == 0.0;
}

Reply via email to