Hi Martin, Here is another 10% improvement. In the loop at the bottom of mzd_combine you can explicitly unroll by a factor of 8:
word * end = b1_ptr + wide; register word * end8 = end - 8; while (b1_ptr < end8) { *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); } while (b1_ptr < end) { *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++); } I did this in combination with changing the crossover for 10000x10000 from 3600 to 7200. Bill. On 17 May, 09:40, Martin Albrecht <[EMAIL PROTECTED]> wrote: > On Saturday 17 May 2008, Bill Hart wrote: > > > In going from 5000x5000 to 10000x10000 Magma's time increases by a > > factor of less than 4. That is impossible. Strassen will never help us > > there. They must be doing something else. Probably something clever. > > > Bill. > > I was stuck there too yesterday. Maybe only at 10000x10000 the pipeline gets > fully utilised? > > Martin > > PS: If we run out of idea we can simply go for parallelism, that should help > on sage.math ;-) > > -- > name: Martin Albrecht > _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99 > _www:http://www.informatik.uni-bremen.de/~malb > _jab: [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---