Wow, NTL has been implemented in a very clever way!

Its Karasuba polynomial multiplication actually bothers to allocate all
the memory it is going to use up front, and this makes a huge
difference to the time!

Just multiplying two GMP bignums of around 300 digits, i.e. polynomials
of degree 0 is 1.5 times faster than NTL, but my first Karasuba
implementation is nearly twice as slow as NTL, even for polynomials of
degree 2. I was scratching my head for quite some time before I figured
out what was going on. Victor is a pretty clever guy.

Anyhow, I'm now wondering whether MAGMA just uses Toom-3 instead of
Karasuba by the time you get to degree 250 or so. I'll implement a
Toom-3 algorithm once I get my Karasuba implementation sorted out, and
we'll see.

----

By the way, does anyone know of any tricks for optimizing for Intel
Pentium-D machines. I found a highly optimized program yesterday which
runs 1.5 times slower on a 3.2GHz Pentium-D than on a 2GHz Athlon. The
problem seems to be that the L2 cache on these machines is twice as
slow for small chunks of data than on the Athlon.

Bill.


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

Reply via email to