I sorted out what was wrong. In my combine code I was combining
operands in the wrong order. This meant that a whole lot of zeroes
ended up where they shouldn't be, hence the if (x) thing working. So
my times were unfortunately completely screwed up. I went back to a
fresh tarball with the original code and applied my changes one by
one. Only two changes actually improved the time.

First I give Magma times vs your (Martin's) code:

10000x10000: 2.940s 3.13s

16384x16384: 9.250s 12.96s

20000x20000: 16.57s 22.43s

32000x32000: 59.1s 90.2s

Now I give times for my code with a single Gray table. The legend
below indicates what the three times mean.

10000x10000: 7.76s 6.70s 4.40s

16384x16384: 44.6s 37.3s 18.3s

20000x20000: 53.3s 45.9s 31.2s

32000x32000:  -------   194s  134s

0) cutoff = 2048, original code no SSE
1) cutoff = 3600, CACHE BLOCK A (256, 768)
2) cutoff = 7200, fix k = 6

So it is clear that two Gray tables is much better than one. The only
thing that is puzzling me now is how you get away with k = 7 with two
Gray tables, whereas I was using k = 6 with one table. Also, you stick
with BLOCK = 768, whereas I found it optimal to switch to 256 for
16384x16384. However, if I make any additional changes to your code it
just slows down. It must have to do with this copying out that you
did. That must significantly affect cache performance.

Anyhow, I'm now going to wipe all versions I have (except my working
one with an optimal single Gray table implementation) and just start
from your code as a base for further improvements. We are still up to
50% slower than Magma on the Opteron!!

Bill.

On 18 May, 02:21, Bill Hart <[EMAIL PROTECTED]> wrote:
> I managed to get the modified version from the spkg. Nice code!!
>
> Unfortunately it is not as fast on my opteron. So more work tomorrow
> for me to try and get it down to the same times as I have with my
> version.
>
> Here are the times all on my opteron. Note your CTD version was
> optimal at a cutoff of 2048, not 7200 as for my code. Now I am worried
> that maybe my code is actually broken somehow and still passing the
> test code. I'll carefully make the changes to your code tomorrow to
> see if that is the case.
>
> Magma CTD-M4RI:2048 AMD-M4RI:7200 AMD-M4RI:2048
>
> 10000x10000: 2.940s 3.13s 3.442s 4.132s
>
> 16384x16384: 9.250s 12.96s 11.47s 11.80s
>
> 20000x20000: 16.57s 22.43s 19.3s 26.05s
>
> 32000x32000: 59.05s 90.20s 71.9s 71.8s
>
> Bill.
>
> On 18 May, 01:58, Bill Hart <[EMAIL PROTECTED]> wrote:
>
> > On 18 May, 00:40, Martin Albrecht <[EMAIL PROTECTED]>
> > wrote:
>
> > > My version is here:
>
> > >    
> > > http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg
>
> > > (this needs an updated patch for Sage)
>
> > > and here:
>
> > >    http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz
>
> > > (which is the raw source).
>
> > This pure C version seems to be the old version, before you made
> > either of the two big changes.
>
> > Bill.
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to