hi guys, I am finally up to date with this discussion (I was being interviewed, and then flying when it started). First, congrats for the great job you have achieved. I have started to dive into m4ri, and I really like the quality of the code.
I have a few remarks * the loop unrolling technique used for the creation of the table, could maybe be used in the computation of the product as well. Is 8 optimal? I remember seeing 32 in ATLAS, but don't know of any justifications. Since some pipeline are longer than 8, this might be better to have a longer unrolled loop. * I am not sure about this, but wouldn't it be better to have a block decomposition that matches the babystep-giantstep structure? This could happen at the strassen threshold : instead of simply copying the matrix (which already improves the data-locality) copy it into a bunch blocks of size blocksize and call m4rm on that structure. ATLAS are doing this kind of copies for dimensions not larger than 200 if I recall correctly. Maybe I am just missing something about your babystep/giantstep algorithm. Anyway, as you pointed out, the battle is now on the asymptotic comparison with Magma, and I still have no ideas on how to improve your strassen implementation. Still thinking about it.... Cheers, Clément Bill Hart a écrit : > Yep that's exactly the same thing as what M4RM does. Thanks for the > explanation. > > Bill. > > On 20 May, 00:22, Robert Miller <[EMAIL PROTECTED]> wrote: > >>> I can't tell exactly what GAP does. It is beautifully documented, but >>> it talks about "grease units", which is terminology I don't >>> understand. It does look like M4RM though. >>> >> Grease is a concept for speeding up certain things using caching. For >> example, suppose I have the permutation group S_{32} acting on ints. I >> can represent a particular permutation as a permutation matrix, which >> means that applying that permutation is just vector-matrix >> multiplication. However, instead of computing the full matrix >> multiplication, we can cut the matrix into pieces (probably of length >> "grease units" or something). Essentially, we compute every possible >> sum of the first four rows, then the next four rows, etc. Then, to see >> how to multiply the vector by the matrix, we cut the vector into >> chunks of four, and simply look up the corresponding entry in the >> "grease table", finally adding them together in the end. >> >> -- RLM >> > > > > --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---