If two Gray tables is better than one, then you can't have enough of a good thing right? So I made 3 Gray tables now.
The files I modified are in: http://sage.math.washington.edu/home/wbhart/m4ri3gray/ There are three main modifications: 1) Make all matrices SSE aligned if we have SSE2, and make all rows of all matrices aligned. This required a fix in brilliantrussian.c, since it makes some assumptions about how the data is stored in matrices. 2) Introduce function combine2 and combine3 which use SSE to do a[i] ^= b[i] ^ c[i] and a[i] ^= b[i] ^ c[i] ^ d[i]. 3) Code to use three Gray tables. I had to remove the a_nc%k's since these were causing segfaults for reasons I don't understand. Here are the Magma times, your old times and the new times on my Opteron: 10000x10000: 2.940s 3.13s 2.82s 16384x16384: 9.250s 12.96s 11.25s 20000x20000: 16.57s 22.43s 19.39s 32000x32000: 59.1s 90.2s 81.56s Note I do not use the new combine2 and combine3 functions as they slow it down on my machine. I cannot believe how useless SSE seems to be! I've commented the three lines out that use combine3 in brilliantrussian.c in case you want to try it with SSE on your CTD. Bill. On 18 May, 17:36, Martin Albrecht <[EMAIL PROTECTED]> wrote: > Hi, > > first, I recorded the different speed-ups in a small table for an overview in > the attachment (I think we've come a far way :-)) To disable the copying out > one needs to edit > > /* we copy the matrix first since it is only constant memory > overhead and improves data locality, if you remove it make sure > there are no speed regressions */ > /* C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE); */ > packedmatrix *Cbar = mzd_init(C->nrows, C->ncols); > Cbar = _mzd_mul_m4rm_impl(Cbar, A, B, 0, FALSE); > mzd_copy(C, Cbar); > mzd_free(Cbar); > return C; > > in strassen.c to > > /* we copy the matrix first since it is only constant memory > overhead and improves data locality, if you remove it make sure > there are no speed regressions */ > C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE); > return C; > > This disables the copying out. > > Martin > > PS: If I find some time later today I'll make some changes such that SSE2 can > be used more often, i.e. align each row at 16-byte borders if HAVE_SSE2 is > used. > > -- > name: Martin Albrecht > _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99 > _www:http://www.informatik.uni-bremen.de/~malb > _jab: [EMAIL PROTECTED] > > timings.html > 2KDownload --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---