Hi,
first, I recorded the different speed-ups in a small table for an overview in
the attachment (I think we've come a far way :-)) To disable the copying out
one needs to edit
/* we copy the matrix first since it is only constant memory
overhead and improves data locality, if you remove it make sure
there are no speed regressions */
/* C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE); */
packedmatrix *Cbar = mzd_init(C->nrows, C->ncols);
Cbar = _mzd_mul_m4rm_impl(Cbar, A, B, 0, FALSE);
mzd_copy(C, Cbar);
mzd_free(Cbar);
return C;
in strassen.c to
/* we copy the matrix first since it is only constant memory
overhead and improves data locality, if you remove it make sure
there are no speed regressions */
C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE);
return C;
This disables the copying out.
Martin
PS: If I find some time later today I'll make some changes such that SSE2 can
be used more often, i.e. align each row at 16-byte borders if HAVE_SSE2 is
used.
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---
Title: Sheet1
Sheet1
|
Magma 2.14-10
|
M4RI-Orig
|
Cache Friendly M4RM
|
Two Tables
|
Cache Friendly, k=6, cutoff=7200
|
10000x10000
|
2.94
|
8
|
4.529
|
3.13
|
4.4
|
16000x16000
|
9.25
|
43
|
20
|
12.96
|
18.3
|
20000x20000
|
16.57
|
59
|
32.34
|
22.43
|
31.2
|
32000x32000
|
59.1
|
--
|
--
|
90.2
|
134
|
Top
|