On Mon, May 19, 2008 at 6:14 PM, Clement Pernet
<[EMAIL PROTECTED]> wrote:
>
> hi guys,
>
> I am finally up to date with this discussion (I was being interviewed,
> and then flying when it started).
> First, congrats for the great job you have achieved. I have started to
> dive into m4ri, and I really like the quality of the code.
>
> I have a few remarks
>
> * the loop unrolling technique used for the creation of the table, could
> maybe be used in the computation of the product as well.
> Is 8 optimal? I remember seeing 32 in ATLAS, but don't know of any
> justifications. Since some pipeline are longer than 8, this might be
> better to have a longer unrolled loop.
>
> * I am not sure about this, but wouldn't it be better to have a block
> decomposition that matches the babystep-giantstep structure?
> This could happen at the strassen threshold : instead of simply copying
> the matrix (which already improves the data-locality) copy it into a
> bunch blocks of size blocksize and call m4rm on that structure. ATLAS
> are doing this kind of copies for dimensions not larger than 200 if I
> recall correctly.
> Maybe I am just missing something about your babystep/giantstep algorithm.
>
> Anyway, as you pointed out, the battle is now on the asymptotic
> comparison with Magma, and I still have no ideas on how to improve your
> strassen implementation. Still thinking about it....
>

Clement, since you are actively doing research on fast mod-3 matrix
multiplication, any chance you could spend 2-3 paragraphs here and
advertise that?   It would fit nicely in this thread.

 -- William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to