Hi,

first, I recorded the different speed-ups in a small table for an overview in 
the attachment (I think we've come a far way :-)) To disable the copying out 
one needs to edit 

    /* we copy the matrix first since it is only constant memory
       overhead and improves data locality, if you remove it make sure
       there are no speed regressions */
    /* C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE); */
    packedmatrix *Cbar = mzd_init(C->nrows, C->ncols);
    Cbar = _mzd_mul_m4rm_impl(Cbar, A, B, 0, FALSE);
    mzd_copy(C, Cbar);
    mzd_free(Cbar);
    return C;

in strassen.c to

    /* we copy the matrix first since it is only constant memory
       overhead and improves data locality, if you remove it make sure
       there are no speed regressions */
    C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE);
    return C;

This disables the copying out.

Martin

PS: If I find some time later today I'll make some changes such that SSE2 can 
be used more often, i.e. align each row at 16-byte borders if HAVE_SSE2 is 
used.

-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Title: Sheet1

Sheet1


Magma 2.14-10 M4RI-Orig Cache Friendly M4RM Two Tables Cache Friendly, k=6, cutoff=7200
10000x10000 2.94 8 4.529 3.13 4.4
16000x16000 9.25 43 20 12.96 18.3
20000x20000 16.57 59 32.34 22.43 31.2
32000x32000 59.1 -- -- 90.2 134

Top

Reply via email to