> Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
> Matrix Dimension      Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
> 10,000 x 10,000       2.920                           3.610
> 16,384 x 16,384       11.140                          12.120
> 20,000 x 20,000       20.370                          24.390
> 32,000 x 32,000       74.290                          94.910
>
> New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
> Matrix Dimension      Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
> 10,000 x 10,000       2.920                           2.990
> 16,384 x 16,384       11.140                          11.750
> 20,000 x 20,000       20.370                          21.180
> 32,000 x 32,000       74.290                          86.570

If you take this + Bill's idea:

> For 10000x10000 we currently use k = 6. Instead of this, we could use
> k = 5 and make two Gray tables simultaneously. This will still fit in
> cache.
>
> Instead of doing 6 bits at a time, we can then do 10 bits at a time.
> We'd load the appropriate line from the first Gray table, then the
> appropriate one from the second and xor them, then xor with the output
> matrix. This should decrease the number of loads and stores
> considerably. Moreover, the SSE instructions will then be much more
> efficient as the ratio of arithmetic instructions to loads and stores
> is higher.
>
> Of course one could also do 16 bits at a time, by doing 4 tables, but
> I think this might actually get slower again since you've only
> increased the amount of work done by 60%, but you've had a 30 %
> increase in instructions.

You get (on the C2D):

sage: B = random_matrix(GF(2), 3.2*10^4, 3.2*10^4)
sage: A = random_matrix(GF(2), 3.2*10^4, 3.2*10^4)
sage: time C= A._multiply_strassen(B,cutoff=2^11)
CPU times: user 75.82 s, sys: 0.22 s, total: 76.04 s
Wall time: 76.31

sage: A = random_matrix(GF(2), 2*10^4, 2*10^4)
sage: B = random_matrix(GF(2), 2*10^4, 2*10^4)
sage: time C= A._multiply_strassen(B,cutoff=2^11)
CPU times: user 19.14 s, sys: 0.09 s, total: 19.24 s
Wall time: 19.29

sage: B = random_matrix(GF(2), 2^14, 2^14)
sage: A = random_matrix(GF(2), 2^14, 2^14)
sage: time C= A._multiply_strassen(B,cutoff=2^11)
CPU times: user 10.62 s, sys: 0.05 s, total: 10.67 s
Wall time: 10.70

sage: B = random_matrix(GF(2), 10^4, 10^4)
sage: A = random_matrix(GF(2), 10^4, 10^4)
sage: time C= A._multiply_strassen(B,cutoff=2^11)
CPU times: user 2.73 s, sys: 0.02 s, total: 2.75 s
Wall time: 2.76

i.e  the speed of my current Magma install on the same computer (mind you, 
this one might not be optimised for the C2D but for the Opteron, I don't 
know). The times above don't have SSE2 yet. I guess documenting Bill's tricks 
 
 - process the rows of A in blocks
 - use two rather than one Gray code table

well is in order since now M4RM looks quite different from the original 
algorithm. I'll do that tomorrow.

Again, thanks Bill!
Martin

-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to