Well, apparently there are speed gains right up to 8 Gray tables,
though I really have no idea why that is.

Here are the new times:

Magma Old New

10000x10000:
   2.940s 3.13s 2.32s

16384x16384:
   9.250s 12.96s 9.17s

20000x20000:
   16.57s 22.43s 16.49s

32000x32000:
   59.1s 90.2s 81.6s 65.7s

So now we beat Magma all the way up to 20000x20000.

I'll put the adjusted code in the directory:

http://sage.math.washington.edu/home/wbhart/m4ri3gray/

in just a moment. I also found a memory leak in my code which I fixed.

Bill.
On 18 May, 19:19, Bill Hart <[EMAIL PROTECTED]> wrote:
> If two Gray tables is better than one, then you can't have enough of a
> good thing right? So I made 3 Gray tables now.
>
> The files I modified are in:
>
> http://sage.math.washington.edu/home/wbhart/m4ri3gray/
>
> There are three main modifications:
>
> 1) Make all matrices SSE aligned if we have SSE2, and make all rows of
> all matrices aligned. This required a fix in brilliantrussian.c, since
> it makes some assumptions about how the data is stored in matrices.
>
> 2) Introduce function combine2 and combine3 which use SSE to do a[i]
> ^= b[i] ^ c[i] and  a[i] ^= b[i] ^ c[i] ^ d[i].
>
> 3) Code to use three Gray tables. I had to remove the a_nc%k's since
> these were causing segfaults for reasons I don't understand.
>
> Here are the Magma times, your old times and the new times on my
> Opteron:
>
> 10000x10000: 2.940s 3.13s 2.82s
>
> 16384x16384: 9.250s 12.96s 11.25s
>
> 20000x20000: 16.57s 22.43s 19.39s
>
> 32000x32000: 59.1s 90.2s 81.56s
>
> Note I do not use the new combine2 and combine3 functions as they slow
> it down on my machine. I cannot believe how useless SSE seems to be!
>
> I've commented the three lines out that use combine3 in
> brilliantrussian.c in case you want to try it with SSE on your CTD.
>
> Bill.
>
> On 18 May, 17:36, Martin Albrecht <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
>
> > first, I recorded the different speed-ups in a small table for an overview 
> > in
> > the attachment (I think we've come a far way :-)) To disable the copying out
> > one needs to edit
>
> >     /* we copy the matrix first since it is only constant memory
> >        overhead and improves data locality, if you remove it make sure
> >        there are no speed regressions */
> >     /* C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE); */
> >     packedmatrix *Cbar = mzd_init(C->nrows, C->ncols);
> >     Cbar = _mzd_mul_m4rm_impl(Cbar, A, B, 0, FALSE);
> >     mzd_copy(C, Cbar);
> >     mzd_free(Cbar);
> >     return C;
>
> > in strassen.c to
>
> >     /* we copy the matrix first since it is only constant memory
> >        overhead and improves data locality, if you remove it make sure
> >        there are no speed regressions */
> >     C = _mzd_mul_m4rm_impl(C, A, B, 0, TRUE);
> >     return C;
>
> > This disables the copying out.
>
> > Martin
>
> > PS: If I find some time later today I'll make some changes such that SSE2 
> > can
> > be used more often, i.e. align each row at 16-byte borders if HAVE_SSE2 is
> > used.
>
> > --
> > name: Martin Albrecht
> > _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> > _www:http://www.informatik.uni-bremen.de/~malb
> > _jab: [EMAIL PROTECTED]
>
> >  timings.html
> > 2KDownload
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to