I sorted out what was wrong. In my combine code I was combining operands in the wrong order. This meant that a whole lot of zeroes ended up where they shouldn't be, hence the if (x) thing working. So my times were unfortunately completely screwed up. I went back to a fresh tarball with the original code and applied my changes one by one. Only two changes actually improved the time.
First I give Magma times vs your (Martin's) code: 10000x10000: 2.940s 3.13s 16384x16384: 9.250s 12.96s 20000x20000: 16.57s 22.43s 32000x32000: 59.1s 90.2s Now I give times for my code with a single Gray table. The legend below indicates what the three times mean. 10000x10000: 7.76s 6.70s 4.40s 16384x16384: 44.6s 37.3s 18.3s 20000x20000: 53.3s 45.9s 31.2s 32000x32000: ------- 194s 134s 0) cutoff = 2048, original code no SSE 1) cutoff = 3600, CACHE BLOCK A (256, 768) 2) cutoff = 7200, fix k = 6 So it is clear that two Gray tables is much better than one. The only thing that is puzzling me now is how you get away with k = 7 with two Gray tables, whereas I was using k = 6 with one table. Also, you stick with BLOCK = 768, whereas I found it optimal to switch to 256 for 16384x16384. However, if I make any additional changes to your code it just slows down. It must have to do with this copying out that you did. That must significantly affect cache performance. Anyhow, I'm now going to wipe all versions I have (except my working one with an optimal single Gray table implementation) and just start from your code as a base for further improvements. We are still up to 50% slower than Magma on the Opteron!! Bill. On 18 May, 02:21, Bill Hart <[EMAIL PROTECTED]> wrote: > I managed to get the modified version from the spkg. Nice code!! > > Unfortunately it is not as fast on my opteron. So more work tomorrow > for me to try and get it down to the same times as I have with my > version. > > Here are the times all on my opteron. Note your CTD version was > optimal at a cutoff of 2048, not 7200 as for my code. Now I am worried > that maybe my code is actually broken somehow and still passing the > test code. I'll carefully make the changes to your code tomorrow to > see if that is the case. > > Magma CTD-M4RI:2048 AMD-M4RI:7200 AMD-M4RI:2048 > > 10000x10000: 2.940s 3.13s 3.442s 4.132s > > 16384x16384: 9.250s 12.96s 11.47s 11.80s > > 20000x20000: 16.57s 22.43s 19.3s 26.05s > > 32000x32000: 59.05s 90.20s 71.9s 71.8s > > Bill. > > On 18 May, 01:58, Bill Hart <[EMAIL PROTECTED]> wrote: > > > On 18 May, 00:40, Martin Albrecht <[EMAIL PROTECTED]> > > wrote: > > > > My version is here: > > > > > > > http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20080516.p1.spkg > > > > (this needs an updated patch for Sage) > > > > and here: > > > > http://sage.math.washington.edu/home/malb/m4ri-20080516.tar.gz > > > > (which is the raw source). > > > This pure C version seems to be the old version, before you made > > either of the two big changes. > > > Bill. --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---