On Tue, 2007-08-14 at 01:07 -0400, didier deshommes wrote:
> 2007/8/14, William Stein <[EMAIL PROTECTED]>:
> > Correct me if I'm wrong, but I don't think that makes sense.
> > If I do this:
> >
> > sage: a = RQDF(5)
> > sage: number_of_partitions(1000)
> > 24061467864032622473692149727991
> > sage: del a
> >
> > then during the number_of_partitions call the CPU is set to
> > the wrong mode.
> 
> You are right, I did not think about this.
> 
> 2007/8/13, Jonathan Bober <[EMAIL PROTECTED]>:
> > So -ffloat-store doesn't work. So on cpus without sse2, the quaddouble
> > wrapper needs to use the fpu fix, but probably should be rewritten so
> > that it doesn't affect other things.
> 
> > One possibility is that one wrap _add_c_impl, _sub_c_impl,
> > etc., by setting the CPU mode, doing the op, then unsetting
> > the cpu mode.   This is probably the robust solution (??),
> > but might have potential efficiency issues (which should be
> > investigated with careful benchmarking).
> 
> It's looking like the *only* solution. Looks fail-safe, and of course,
> very ugly :).
> 

This is exactly what NTL does in its quad float class. Just about every
function starts and ends with a macro to adjust the fpu, resulting in
around 7 extra assembly instructions. In the following code, the
overhead is quite significant - it takes around 21 seconds to execute on
my machine, but only about 4 seconds without the START_FIX and END_FIX.
Of course, this is not necessarily any sort of accurate test, but it
does indicate that this can be an expensive operation.

In any case, the fpu fix should be probably be implemented as a
macro/inline function in a header file (which would do nothing if quad
double was compiled with sse2 support) rather than as a full blown
function call, like it is now.

#define START_FIX \
  volatile unsigned short __old_cw, __new_cw; \
  asm volatile ("fnstcw %0":"=m" (__old_cw)); \
  __new_cw = (__old_cw & ~0x300) | 0x200; \
  asm volatile ("fldcw %0": :"m" (__new_cw));


#define END_FIX  asm volatile ("fldcw %0": :"m" (__old_cw));

int main() {
    int i,j;
    double x;
    double y = 1.2;
    double z = 1.3;
    for(i = 0; i < 10000000; i++) {
        for(j = 0; j < 100; j++) {
            START_FIX
            x = y * z;
            END_FIX
        }
    }
    return 0;
}

> On a related note, it looks like GSL has a "-gsl" compilation flag
> that "Use[s] GSL to control floating point rounding and precision".
> 
> didier
> 
> > 
> 
> 


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

Reply via email to