On Monday 19 May 2008, Bill Hart wrote: > You seemed to be getting up to 8% at points there. That's definitely > worth it. I'll be interested to see this evening how it comes out, > though I recommend optimising my combine3 function (which I suppose > should now be combine8), even including it inline rather than have it > in a separate file. > > Of course on the Opteron, SSE should be switched off, since it is > definitely slower by about 5%-10% even with careful optimisation. > > Bill.
Okay, so a good compromise is to remove all SSE2 stuff from the main function _mzd_mul_m4rm_impl and put it in static inline _mzd_combine8 function which is specifically tailored towards this particular application. Thus the code still looks relatively pretty/elegant but we can have SSE2 support. Martin -- name: Martin Albrecht _pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99 _www: http://www.informatik.uni-bremen.de/~malb _jab: [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---