Bill,

I do get a small speed-up on the Core2Duo for SSE2 but I'm not sure it is 
worth the trouble (I agree that it make the otherwise pretty looking code 
ugly). 

I have some timings (for an old implementation) here: 

   http://trac.sagemath.org/sage_trac/ticket/3204#comment:2

My guess is that SSE2 is slower on the Opteron because SSE2 is basically an 
Intel thing and only provided by AMD for compatibility reasons. There are 
several reports of SSE2 being slow on the Opteron and I guess the SSE2 
integer operations were not focused for speed since MMX/SSE is all about 
floating point mainly.

One thing I noticed on the Opteron is that if I put the code in mzd_combine 
vs. putting the same code directly in the function made huge difference. I 
blamed it on better cache prefetching support but that was probably 
preliminary.

My proposal: 
 - This evening I'll update my code with your 8 Gray tables and check the 
performance on the C2D
 - Then I'll re-introduce SSE2 and check whether it makes a worthy difference, 
if not we drop SSE2 from the multiplication.

Martin

PS: I tried a quick and dirty OpenMP (which is cool, btw) based parallel 
implementation of Strassen-Winograd yesterday and it gives - as is - a 
speedup of 1.8 (so not optimal yet) or so. But comparing that with Magma 
feels like cheating, first we should aim for better speed with the same 
resources and then we switch to parallel implementations for even better 
times. Anyway, I wouldn't have believed that I can do a 10^4 x 10^4 matrix 
multiplication in 1.7 seconds on my notebook one week ago :-)

-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to