On Mon, May 19, 2008 at 6:14 PM, Clement Pernet <[EMAIL PROTECTED]> wrote: > > hi guys, > > I am finally up to date with this discussion (I was being interviewed, > and then flying when it started). > First, congrats for the great job you have achieved. I have started to > dive into m4ri, and I really like the quality of the code. > > I have a few remarks > > * the loop unrolling technique used for the creation of the table, could > maybe be used in the computation of the product as well. > Is 8 optimal? I remember seeing 32 in ATLAS, but don't know of any > justifications. Since some pipeline are longer than 8, this might be > better to have a longer unrolled loop. > > * I am not sure about this, but wouldn't it be better to have a block > decomposition that matches the babystep-giantstep structure? > This could happen at the strassen threshold : instead of simply copying > the matrix (which already improves the data-locality) copy it into a > bunch blocks of size blocksize and call m4rm on that structure. ATLAS > are doing this kind of copies for dimensions not larger than 200 if I > recall correctly. > Maybe I am just missing something about your babystep/giantstep algorithm. > > Anyway, as you pointed out, the battle is now on the asymptotic > comparison with Magma, and I still have no ideas on how to improve your > strassen implementation. Still thinking about it.... >
Clement, since you are actively doing research on fast mod-3 matrix multiplication, any chance you could spend 2-3 paragraphs here and advertise that? It would fit nicely in this thread. -- William --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---