On Monday 19 May 2008, Bill Hart wrote:
> You seemed to be getting up to 8% at points there. That's definitely
> worth it. I'll be interested to see this evening how it comes out,
> though I recommend optimising my combine3 function (which I suppose
> should now be combine8), even including it inline rather than have it
> in a separate file.

Bill,

some progress report for the C2D:

I incorporated your changes with the following modifications:
- if (x1==0 & x2==0 & x3==0 .... x8==8) ... I removed this one since it seems 
rather unlikely that this is true
- I added #define called GRAY8 which defines whether 8 or 4 tables ought to be 
used. I figured this might be handy for machines with a smaller L1.
- I added the correct a_nc%k values back in
- We don't need to make sure that the allocated buffers are correctly aligned, 
since we try allocate with _mm_malloc. If that is not available we should 
probably just use posix_memalign.

The speed-up on the C2D (similar to the Opteron) is considerable (the last 
column is a parallel toy implementation):

64-bit Debian/GNU Linux, 2.33Ghz Core2Duo, cutoff=2^12 (*)
Matrix Dimension        Magma   M4RI            M4RI (OpenMP), walltime
10,000 x 10,000         2.920           2.270           1.470
16,384 x 16,384         11.140          9.130           5.540
20,000 x 20,000         20.370          16.110          11.800
32,000 x 32,000         75.510          64.340          40.040

Amazingly M4RM alone (w.o. Strassen-Winograd) now beats Magma up to 2*10^4 x 
2*10^4 in this configuration:

sage: A = random_matrix(GF(2),10^3, 10^3);
sage: B = random_matrix(GF(2),10^3, 10^3);
sage: magma(A._multiply_m4rm(B)) == magma(A)*magma(B) 
True
sage: A = random_matrix(GF(2),2*10^4, 2*10^4);
sage: B = random_matrix(GF(2),2*10^4, 2*10^4);
sage: time A._multiply_m4rm(B)
CPU times: user 18.32 s, sys: 0.10 s, total: 18.42 s
Wall time: 18.65

64-bit Debian/GNU Linux, 1.8Ghz Opteron (sage.math), cutoff=2^11
Matrix Dimension        Magma 2.13-5 (64-bit)           M4RI-20080518 (64-bit)
10,000 x 10,000         4.190                           4.290
16,384 x 16,384         15.360                          15.230
20,000 x 20,000         29.530                          28.640
32,000 x 32,000         103.970                 114.620

That does seem to roughly match what you reported. I'll now look into SSE2 
support.

Cheers,
Martin

(*) Note: Magma is 64-bit but not optimised for C2D.

-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to