Wow, NTL has been implemented in a very clever way! Its Karasuba polynomial multiplication actually bothers to allocate all the memory it is going to use up front, and this makes a huge difference to the time!
Just multiplying two GMP bignums of around 300 digits, i.e. polynomials of degree 0 is 1.5 times faster than NTL, but my first Karasuba implementation is nearly twice as slow as NTL, even for polynomials of degree 2. I was scratching my head for quite some time before I figured out what was going on. Victor is a pretty clever guy. Anyhow, I'm now wondering whether MAGMA just uses Toom-3 instead of Karasuba by the time you get to degree 250 or so. I'll implement a Toom-3 algorithm once I get my Karasuba implementation sorted out, and we'll see. ---- By the way, does anyone know of any tricks for optimizing for Intel Pentium-D machines. I found a highly optimized program yesterday which runs 1.5 times slower on a 3.2GHz Pentium-D than on a 2GHz Athlon. The problem seems to be that the L2 cache on these machines is twice as slow for small chunks of data than on the Athlon. Bill. --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~----------~----~----~----~------~----~------~--~---