Hi Martin, That spike is wierd. Basically I got closer to 2x the time of Magma for 16384x16384, but you need different parameters than for 10000x10000 or 20000x20000 since the size of the M4R matrices will be different than in either of those cases.
For 16384x16384 my notes say that I used k = 6 and BLOCK = 256. One might also need to fiddle with the cutoff. I think I used 3600, but at various times I had the cutoff set lower. It would be awfully surprising if there wasn't a set of parameters that dropped this time right down. It is possible that the L1 cache is smaller on sage.math than on the Opteron I was using. Perhaps my parameters don't apply on that machine. Bill. On 16 May, 23:05, Martin Albrecht <[EMAIL PROTECTED]> wrote: > On Friday 16 May 2008, Bill Hart wrote: > > > Here are the times I get for Magma vs M4RI now. Note the crossover > > between the two programs is now above about 5000 and M4RI beats Magma > > below that point. This suggests the remaining factor of 2 is in the > > Strassen-Winograd function. Probably Winograd-Strassen is falling out > > of L2 cache (the previous adjustments I made were to prevent the M4R > > algorithm falling out of L1 cache). > > > The other possibility is that Magma combines the two algorithms so > > that there is even greater usage of the Gray code tables. This would > > be an ugly hack, but could work. > > Are you suggesting Magma uses M4RM too? I'd doubt that, since they don't state > that anyway. Probably I'm just misunderstanding you. > > > > > 40000x40000: > > Magma: 112.6s > > M4RI: 232.4s > > > 20000x20000: > > Magma: 16.40s > > M4RI: 32.34s > > > 10000x10000: > > Magma: 2.750s > > M4RI: 4.529s > > > 5000x5000: > > Magma: 0.700s > > M4RI: 0.672s > > > 2500x2500: > > Magma: 0.13s > > M4RI: 0.079s > > > 1250x1250: > > Magma: 0.015s > > M4RI: 0.012s > > > 625x625: > > Magma: 0.0030s > > M4RI: 0.0023s > > > 312x312: > > Magma: 0.0014s > > M4RI: 0.00032s > > > 156x156: > > Magma: 0.001s > > M4RI: 0.0001s > > > If I get some time I'll look into this. > > > Did those changes work for you Martin? > > > Bill. > > Yes, the change worked like a charm. I made some changes (the fixed k is > replaced by a k that depends on the new block dimensions etc.) and it is much > much faster now. I'm working on re-introducing SSE2 now to see if it at least > on the Core2Duo makes the world a better place. Btw. all the stuff I wrote > about the L2 cache size C2D vs. Opteron was bollocks. The reason I beat Magma > so badly earlier was that I used a 32-bit Magma and compared it with a 64-bit > version of M4RI. To state it clearly: We don't beat Magma anywhere. Anyhow, > here are the times: > > 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo > Matrix Dimension Magma 2.14-13 (64-bit) M4RI-20080517 (64-bit) > 10,000 x 10,000 2.920 4.130 > 16,384 x 16,384 11.140 15.740 > 20,000 x 20,000 20.370 28.950 > > 64-bit Debian/GNU Linux, 1.8Ghz Opteron (sage.math) > Matrix Dimension Magma 2.13-5 (64-bit) M4RI-20080517 (64-bit) > 10,000 x 10,000 3.930 7.860 > 16,384 x 16,384 16.230 104.77??? > 20,000 x 20,000 27.080 56.420 > > (My university today finally granted me access to Magma 2.14 for my notebook.) > > As you can see there is an odd (reproducible) spike at 2^14 x 2^14. I cannot > explain that for now, it might just be a bug. I have a similar spike in > another run on another AMD (Athlon X2) machine: > > n: 2048, cutoff: 1024, speedup: 1.06, m4rm: 0.01 strassen: 0.01 > n: 3072, cutoff: 1536, speedup: 0.87, m4rm: 0.04 strassen: 0.05 > n: 4096, cutoff: 2048, speedup: 1.13, m4rm: 0.15 strassen: 0.13 > n: 5120, cutoff: 2560, speedup: 1.35, m4rm: 0.39 strassen: 0.29 > n: 6144, cutoff: 3072, speedup: 1.20, m4rm: 0.66 strassen: 0.55 > n: 7168, cutoff: 3584, speedup: 1.87, m4rm: 1.64 strassen: 0.88 > n: 8192, cutoff: 4096, speedup: 4.48, m4rm: 6.07 strassen: 1.35>>> n: 9216, > cutoff: 4608, speedup: 2.94, m4rm: 8.90 strassen: 3.02 <<< > > n: 10240, cutoff: 5120, speedup: 4.68, m4rm: 12.99 strassen: 2.78 > n: 11264, cutoff: 5632, speedup: 4.84, m4rm: 18.78 strassen: 3.88 > n: 12288, cutoff: 6144, speedup: 4.65, m4rm: 24.40 strassen: 5.24 > n: 13312, cutoff: 6656, speedup: 3.32, m4rm: 30.92 strassen: 9.33 > > I'm investigating this one too and play around with the parameters. Since the > C2D times are considerably better I also bet it is L2 related, we'll see. > > Thanks again! > Martin > > -- > name: Martin Albrecht > _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99 > _www:http://www.informatik.uni-bremen.de/~malb > _jab: [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---