On 8/14/07, cwitty <[EMAIL PROTECTED]> wrote:
> On Aug 14, 12:59 am, Jonathan Bober <[EMAIL PROTECTED]> wrote:
> > This is exactly what NTL does in its quad float class. Just about every
> > function starts and ends with a macro to adjust the fpu, resulting in
> > around 7 extra assembly instructions. In the following code, the
> > overhead is quite significant - it takes around 21 seconds to execute on
> > my machine, but only about 4 seconds without the START_FIX and END_FIX.
> > Of course, this is not necessarily any sort of accurate test, but it
> > does indicate that this can be an expensive operation.
>
> Yes, changing the floating-point modes is very slow on many (all?) x86
> processors.  I believe it flushes the floating-point pipeline, which
> takes many clock cycles.

OK, how about this plan:

(1) On systems with sse2, we do the option 3a (which is "If a
processor supports sse2,
then passing gcc -march=whatever -msse2 -mfpmath=sse (maybe the -march
isn't needed) will cause gcc to use sse registers and instructions for
doubles, and these have the proper precision.")

(2) On systems without sse2 (old slow pentium 3's) we do the START_FIX
and END_FIX.  These computers are very slow anyways, so let them suffer
(and the suffering is *only* for code that uses quaddouble, which is very little
code anyways).

William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

Reply via email to