Woot!!
On 17 May, 23:46, Martin Albrecht <[EMAIL PROTECTED]>
wrote:
> > Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
> > Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit)
> > 10,000 x 10,000 2.920 3.610
> > 16,384 x 16,384 11.140 12.120
> > 20,000 x 20,000 20.370 24.390
> > 32,000 x 32,000 74.290 94.910
>
> > New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
> > Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit)
> > 10,000 x 10,000 2.920 2.990
> > 16,384 x 16,384 11.140 11.750
> > 20,000 x 20,000 20.370 21.180
> > 32,000 x 32,000 74.290 86.570
>
> If you take this + Bill's idea:
>
>
>
> > For 10000x10000 we currently use k = 6. Instead of this, we could use
> > k = 5 and make two Gray tables simultaneously. This will still fit in
> > cache.
>
> > Instead of doing 6 bits at a time, we can then do 10 bits at a time.
> > We'd load the appropriate line from the first Gray table, then the
> > appropriate one from the second and xor them, then xor with the output
> > matrix. This should decrease the number of loads and stores
> > considerably. Moreover, the SSE instructions will then be much more
> > efficient as the ratio of arithmetic instructions to loads and stores
> > is higher.
>
> > Of course one could also do 16 bits at a time, by doing 4 tables, but
> > I think this might actually get slower again since you've only
> > increased the amount of work done by 60%, but you've had a 30 %
> > increase in instructions.
>
> You get (on the C2D):
>
> sage: B = random_matrix(GF(2), 3.2*10^4, 3.2*10^4)
> sage: A = random_matrix(GF(2), 3.2*10^4, 3.2*10^4)
> sage: time C= A._multiply_strassen(B,cutoff=2^11)
> CPU times: user 75.82 s, sys: 0.22 s, total: 76.04 s
> Wall time: 76.31
>
> sage: A = random_matrix(GF(2), 2*10^4, 2*10^4)
> sage: B = random_matrix(GF(2), 2*10^4, 2*10^4)
> sage: time C= A._multiply_strassen(B,cutoff=2^11)
> CPU times: user 19.14 s, sys: 0.09 s, total: 19.24 s
> Wall time: 19.29
>
> sage: B = random_matrix(GF(2), 2^14, 2^14)
> sage: A = random_matrix(GF(2), 2^14, 2^14)
> sage: time C= A._multiply_strassen(B,cutoff=2^11)
> CPU times: user 10.62 s, sys: 0.05 s, total: 10.67 s
> Wall time: 10.70
>
> sage: B = random_matrix(GF(2), 10^4, 10^4)
> sage: A = random_matrix(GF(2), 10^4, 10^4)
> sage: time C= A._multiply_strassen(B,cutoff=2^11)
> CPU times: user 2.73 s, sys: 0.02 s, total: 2.75 s
> Wall time: 2.76
>
> i.e the speed of my current Magma install on the same computer (mind you,
> this one might not be optimised for the C2D but for the Opteron, I don't
> know). The times above don't have SSE2 yet. I guess documenting Bill's tricks
>
> - process the rows of A in blocks
> - use two rather than one Gray code table
>
> well is in order since now M4RM looks quite different from the original
> algorithm. I'll do that tomorrow.
>
> Again, thanks Bill!
> Martin
>
> --
> name: Martin Albrecht
> _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> _www:http://www.informatik.uni-bremen.de/~malb
> _jab: [EMAIL PROTECTED]
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---