[sage-devel] Re: SSE2 not so useless after all

Clement Pernet Mon, 19 May 2008 18:14:36 -0700

hi guys,

I am finally up to date with this discussion (I was being interviewed, 
and then flying when it started).
First, congrats for the great job you have achieved. I have started to 
dive into m4ri, and I really like the quality of the code.

I have a few remarks

* the loop unrolling technique used for the creation of the table, could 
maybe be used in the computation of the product as well.
Is 8 optimal? I remember seeing 32 in ATLAS, but don't know of any 
justifications. Since some pipeline are longer than 8, this might be 
better to have a longer unrolled loop.

* I am not sure about this, but wouldn't it be better to have a block 
decomposition that matches the babystep-giantstep structure?
This could happen at the strassen threshold : instead of simply copying 
the matrix (which already improves the data-locality) copy it into a 
bunch blocks of size blocksize and call m4rm on that structure. ATLAS 
are doing this kind of copies for dimensions not larger than 200 if I 
recall correctly.
Maybe I am just missing something about your babystep/giantstep algorithm.

Anyway, as you pointed out, the battle is now on the asymptotic 
comparison with Magma, and I still have no ideas on how to improve your 
strassen implementation. Still thinking about it....

Cheers,

Clément

Bill Hart a écrit :
> Yep that's exactly the same thing as what M4RM does. Thanks for the
> explanation.
>
> Bill.
>
> On 20 May, 00:22, Robert Miller <[EMAIL PROTECTED]> wrote:
>   
>>> I can't tell exactly what GAP does. It is beautifully documented, but
>>> it talks about "grease units", which is terminology I don't
>>> understand. It does look like M4RM though.
>>>       
>> Grease is a concept for speeding up certain things using caching. For
>> example, suppose I have the permutation group S_{32} acting on ints. I
>> can represent a particular permutation as a permutation matrix, which
>> means that applying that permutation is just vector-matrix
>> multiplication. However, instead of computing the full matrix
>> multiplication, we can cut the matrix into pieces (probably of length
>> "grease units" or something). Essentially, we compute every possible
>> sum of the first four rows, then the next four rows, etc. Then, to see
>> how to multiply the vector by the matrix, we cut the vector into
>> chunks of four, and simply look up the corresponding entry in the
>> "grease table", finally adding them together in the end.
>>
>> -- RLM
>>     
> >
>
>   

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: SSE2 not so useless after all

Reply via email to