[sage-devel] Re: Fwd: Fwd: [sage-devel] Re: sloppy mult and div in quaddouble?

Francois Thu, 01 May 2008 21:06:52 -0700

On May 2, 12:51 am, "William Stein" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> These are some comments from Paul Zimmerman about QuadDouble.
>
> ---------- Forwarded message ----------
> From: Paul Zimmermann <[EMAIL PROTECTED]>
> Date: Thu, May 1, 2008 at 3:41 AM
> Subject: Re: Fwd: [sage-devel] Re: sloppy mult and div in quaddouble?
> To: William Stein <[EMAIL PROTECTED]>
>
>        Dear William,
>
>  > You may have some comments on this thread, especially the very last
>  > message.
>
>  yes with pleasure!
>
>  >  --enable-ieee-add       use addition that satisfies IEEE-style error
>  > bound
>  >                          instead of Cray-style error bound.
>  > [default=no]
>  >  --enable-sloppy-mul     use fast but slightly inaccurate
>  > multiplication.
>  >                          [default=yes]
>  >  --enable-sloppy-div     use fast but slightly inaccurate division.
>  >                          [default=yes]
>  > None of these are changed from the default in
>  > quaddouble spkg-install, which means that sage
>  > use the sloppy options and Cray-style error bound.
>  > Is it really what we want in sage?
>
> > I do not think that we want the  current defaults, but I am not sure.
>  > Are there any performance implications? We are also not shipping the
>  > current quaddouble code, so checking out what the current release does
>  > might shed some light on this.
>
>  > Unless somebody objects and thinks that the current default behavior
>  > is the way to go we should switch to using "--enable-ieee-add"
>
>  I agree with Michael here (and similarly --disable-sloppy-{mul,div}).
>
>  > I'm not sure exactly what the speed differences are, but I think that
>  > they are quite significant. When writing the partition counting code,
>  > which uses quaddouble, I recall that things ran much slower if "sloppy"
>  > multiplication and division were not enabled. (However, I have no hard
>  > benchmarks to back this up right now, so this statement shouldn't be
>  > taken too seriously, and I could be wrong -- it would be nice if I had
>  > some real data.)
>
>  Real benchmarks would be interesting.
>
>  > Anyway, I don't think that this isn't necessarily an issue of
>  > correctness vs. noncorrectness. It's an issue of precision -- in one
>  > implementation multiplication/division is correct to within X bits, and
>  > in the other it is correct to within (X + a few more) bits, but it takes
>  > [a lot?] longer. In at least some applications, it is very desirable to
>  > accept a small loss of precision in exchange for a large speedup.
>
>  the main difference is that in the IEEE-754 compliant case, you should be
>  able to explicit what is (X + a few more), but I doubt in the other case.
>
>  > Also, regarding benchmarking, I wonder what the speed difference is
>  > between quaddouble with IEEE error bounds vs. mprf at 212 bits. And
>  > while I'm wondering, I also wonder whether there is any even any reason
>  > for RQDF to exist:
>
>  > sage: def f(x):
>  >   ...:     for i in xrange(1000000):
>  >   ...:         y = x * x
>  >   ...:
>  > sage: x = RQDF(1/1000)
>  > sage: timeit("f(x)")
>  > 5 loops, best of 3: 802 ms per loop
>  > sage: x = RealField(212)(1/1000)
>  > sage: timeit("f(x)")
>  > 5 loops, best of 3: 872 ms per loop
>
>  > quaddouble is certainly faster than mpfr, but it seems likely that in
>  > any application from with sage the python overhead will eat up most of
>  > the speed difference. (quaddouble is certainly good thing to have
>  > "underneath the hood", but I just don't know that there is much use to
>  > having it externally visible.)
>
>  this is coherent with the benchmarks I did at SD6 (at that time I did not
>  know about the sloppy configure options for RQDF).
>
>  My opinion regarding this is the following (cf my talk at SD6):
>
>  (1) any floating-point software should give the user a guarantee about the
>     maximal rounding error for each atomic operation (add, sub, mul, div, ...)
>
>  (2) once (1) is satisfied, then we can try to optimize for speed. In 
> particular
>     we cannot compare two software tools that provide different precisions,
>     or a tool that satisfies (1) like MPFR and another tool which doesn't
>     (I'm even not sure with RQDF and the non-sloppy options)
>
>  On the technical side, I would ask the quaddouble developers to precise
>  what they mean by "IEEE-style error bound" (--enable-ieee-add), and by
>  "accurate multiplication" (--enable-sloppy-mul=no). I'm not sure they
>  can guarantee 212=4*53 correct bits.
>
>  Paul
>
I commented about correctness on the other thread, I also posted the
answer I got for the errors, I will repeat it just in case:


------------------------------------------
For ieee-add, it all depends on what kind of error bound you need.
If enabled, the error satisfies

  |e| <= |a+b| * epsilon

It not enabled, the error satisfies

  |e| <= epsilon * max (|a|, |b|)
-------------------------------------------

I did run the test suite that comes with qd it can give a rough idea
on performance. I did all the run with --enable-ieee-add.

"sloppy" operations enabled:
Performing double-double precision PSLQ.
  testing pslq_test(2, 2) ...
    elapsed time = 0.0001737 seconds.
  test passed.

  testing pslq_test(2, 3) ...
    elapsed time = 0.001726 seconds.
  test passed.

  testing pslq_test(2, 4) ...
    elapsed time = 0.01282 seconds.
  test passed.

  testing pslq_test(3, 3) ...
    elapsed time = 0.01698 seconds.
  test passed.

  testing pslq_test(2, 5) ...
    elapsed time = 0.02072 seconds.
  test passed.

Performing quad-double precision PSLQ.
  testing pslq_test(3, 3) ...
    elapsed time = 0.1033 seconds.
  test passed.

  testing pslq_test(2, 5) ...
    elapsed time = 0.02072 seconds.
  test passed.

  testing pslq_test(4, 3) ...
    elapsed time = 0.427 seconds.
  test passed.

  testing pslq_test(2, 6) ...
    elapsed time = 0.3989 seconds.
  test passed.

  testing pslq_test(2, 7) ...
    elapsed time = 0.8612 seconds.
  test passed.

  testing pslq_test(3, 5) ...
    elapsed time = 1.332 seconds.
  test passed.

All tests passed.
PASS: pslq_test
Test 1.  (Salamin-Brent quadratically convergent formula for pi)
  iteration 0:
4.00000000000000000000000000000000000000000000000000000000000000e+00
  iteration 1:
3.18767264271210862720192997052536923265105357185936922648763399e+00
  iteration 2:
3.14168029329765329391807042456000938279571943881540283264418946e+00
  iteration 3:
3.14159265389544649600291475881804348610887923726131158965110136e+00
  iteration 4:
3.14159265358979323846636060270663132175770241134242935648684602e+00
  iteration 5:
3.14159265358979323846264338327950288419716994916472660583469612e+00
  iteration 6:
3.14159265358979323846264338327950288419716939937510582097494458e+00
          _pi:
3.14159265358979323846264338327950288419716939937510582097494459e+00
        error: 8.20417e-63 = 6.75 eps
===================================================

"sloppy" operations disabled:
Performing double-double precision PSLQ.
  testing pslq_test(2, 2) ...
    elapsed time = 0.0001991 seconds.
  test passed.

  testing pslq_test(2, 3) ...
    elapsed time = 0.001934 seconds.
  test passed.

  testing pslq_test(2, 4) ...
    elapsed time = 0.006379 seconds.
  test passed.

  testing pslq_test(3, 3) ...
    elapsed time = 0.01822 seconds.
  test passed.

  testing pslq_test(2, 5) ...
    elapsed time = 0.02224 seconds.
  test passed.

Performing quad-double precision PSLQ.
  testing pslq_test(3, 3) ...
    elapsed time = 0.1264 seconds.
  test passed.

  testing pslq_test(2, 5) ...
    elapsed time = 0.0223 seconds.
  test passed.

  testing pslq_test(4, 3) ...
    elapsed time = 0.5291 seconds.
  test passed.

  testing pslq_test(2, 6) ...
    elapsed time = 0.4898 seconds.
  test passed.

  testing pslq_test(2, 7) ...
    elapsed time = 1.069 seconds.
  test passed.

  testing pslq_test(3, 5) ...
    elapsed time = 1.648 seconds.
  test passed.

All tests passed.
PASS: pslq_test
Test 1.  (Salamin-Brent quadratically convergent formula for pi)
  iteration 0:
4.00000000000000000000000000000000000000000000000000000000000000e+00
  iteration 1:
3.18767264271210862720192997052536923265105357185936922648763399e+00
  iteration 2:
3.14168029329765329391807042456000938279571943881540283264418946e+00
  iteration 3:
3.14159265389544649600291475881804348610887923726131158965110136e+00
  iteration 4:
3.14159265358979323846636060270663132175770241134242935648684602e+00
  iteration 5:
3.14159265358979323846264338327950288419716994916472660583469613e+00
  iteration 6:
3.14159265358979323846264338327950288419716939937510582097494460e+00
          _pi:
3.14159265358979323846264338327950288419716939937510582097494459e+00
        error: 4.15906e-63 = 3.42188 eps
=============================

The psql test is ~20% to ~25% faster with sloppy ops depending on the
values.
Of course that will vary with the task but that's probably an
indicator.
The test on convergence to pi is interesting as the error nearly
doubles between
the two cases but we talk about very small errors.
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: Fwd: Fwd: [sage-devel] Re: sloppy mult and div in quaddouble?

Reply via email to