On Monday 19 May 2008, Bill Hart wrote: > You seemed to be getting up to 8% at points there. That's definitely > worth it. I'll be interested to see this evening how it comes out, > though I recommend optimising my combine3 function (which I suppose > should now be combine8), even including it inline rather than have it > in a separate file.
Bill, some progress report for the C2D: I incorporated your changes with the following modifications: - if (x1==0 & x2==0 & x3==0 .... x8==8) ... I removed this one since it seems rather unlikely that this is true - I added #define called GRAY8 which defines whether 8 or 4 tables ought to be used. I figured this might be handy for machines with a smaller L1. - I added the correct a_nc%k values back in - We don't need to make sure that the allocated buffers are correctly aligned, since we try allocate with _mm_malloc. If that is not available we should probably just use posix_memalign. The speed-up on the C2D (similar to the Opteron) is considerable (the last column is a parallel toy implementation): 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo, cutoff=2^12 (*) Matrix Dimension Magma M4RI M4RI (OpenMP), walltime 10,000 x 10,000 2.920 2.270 1.470 16,384 x 16,384 11.140 9.130 5.540 20,000 x 20,000 20.370 16.110 11.800 32,000 x 32,000 75.510 64.340 40.040 Amazingly M4RM alone (w.o. Strassen-Winograd) now beats Magma up to 2*10^4 x 2*10^4 in this configuration: sage: A = random_matrix(GF(2),10^3, 10^3); sage: B = random_matrix(GF(2),10^3, 10^3); sage: magma(A._multiply_m4rm(B)) == magma(A)*magma(B) True sage: A = random_matrix(GF(2),2*10^4, 2*10^4); sage: B = random_matrix(GF(2),2*10^4, 2*10^4); sage: time A._multiply_m4rm(B) CPU times: user 18.32 s, sys: 0.10 s, total: 18.42 s Wall time: 18.65 64-bit Debian/GNU Linux, 1.8Ghz Opteron (sage.math), cutoff=2^11 Matrix Dimension Magma 2.13-5 (64-bit) M4RI-20080518 (64-bit) 10,000 x 10,000 4.190 4.290 16,384 x 16,384 15.360 15.230 20,000 x 20,000 29.530 28.640 32,000 x 32,000 103.970 114.620 That does seem to roughly match what you reported. I'll now look into SSE2 support. Cheers, Martin (*) Note: Magma is 64-bit but not optimised for C2D. -- name: Martin Albrecht _pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99 _www: http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---