Martin, That's all excellent news!! So on the c2d we are caning magma. But we should try and figure out if your magma version is optimised for c2d or for amd64, since that will make a big difference. Is your machine some kind of 64 bit Intel OSX machine? I don't see a specific core 2 version of Magma on their current list. Of course if you just had a generic linux x86 version of Magma, that would be much slower than optimal.
It's amazing how much difference the SSE makes on your machine. The AMD does essentially use its MMX or SSE hardware to read in cache lines I believe, so basically unless you are doing something requiring lots of wide arithmetic/logic, you aren't going to get anything more out of the chip. I look forward to seeing the new code now that you've cleaned it up. I'm going to try and figure out what GAP does, in case there's any ideas we missed. It's surely old code, but there might be lots of interesting things in there. Anyhow, who would have thought that one would see 1.22s for a 10000x10000 matrix multiply. That's pretty exciting. Bill. On 19 May, 21:39, Martin Albrecht <[EMAIL PROTECTED]> wrote: > On Monday 19 May 2008, Bill Hart wrote: > > > You seemed to be getting up to 8% at points there. That's definitely > > worth it. I'll be interested to see this evening how it comes out, > > though I recommend optimising my combine3 function (which I suppose > > should now be combine8), even including it inline rather than have it > > in a separate file. > > > Of course on the Opteron, SSE should be switched off, since it is > > definitely slower by about 5%-10% even with careful optimisation. > > > Bill. > > Okay, I added SSE2 support again and the timings are pretty good on the C2D: > > Dimension Old New > 10000 x 10000 2.270 1.720 > 16384 x 16384 9.130 6.760 > 20000 x 20000 16.110 12.310 > 32000 x 32000 64.340 50.690 > > Throwing parallelism in the mix (still lame implementation): > > Dimension Old New > 10000 x 10000 1.470 1.220 > 16384 x 16384 5.540 4.390 > 20000 x 20000 11.800 8.580 > 32000 x 32000 40.040 32.810 > > Btw. Mike Hansen pointed out on IRC that GAP has a pretty fast implementation > of matrix multiplication too: > > GAP4, Version: 4.4.10 of 02-Oct-2007, x86_64-unknown-linux-gnu-gcc > gap> A := RandomMat(10000,10000,GF(2)); > <a 10000x10000 matrix over GF2> > gap> B := RandomMat(10000,10000,GF(2)); > <a 10000x10000 matrix over GF2> > gap> C := A*B; > <a 10000x10000 matrix over GF2> > gap> time; > 5951 > > The unit here is ms so this takes 6 seconds. However, the generation of random > matrices takes forever. Mike also pointed out that GAP is twice as fast for > the example he tried than the current Sage code (i.e. the code before the > improvements discussed in this thread). > > On sage.math things don't improve as expected: > > sage: A = random_matrix(GF(2),32000,32000) > sage: B = random_matrix(GF(2),32000,32000) > sage: time C = A._multiply_strassen(B,cutoff=2^11) > CPU times: user 121.69 s, sys: 3.93 s, total: 125.62 s > Wall time: 125.62 > > This was 114.620 before. > > Martin > > -- > name: Martin Albrecht > _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99 > _www:http://www.informatik.uni-bremen.de/~malb > _jab: [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---