Hi Martin,

Here is another 10% improvement. In the loop at the bottom of
mzd_combine you can explicitly unroll by a factor of 8:

    word * end = b1_ptr + wide;
    register word * end8 = end - 8;
    while (b1_ptr < end8)
    {
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
    }
    while (b1_ptr < end)
    {
         *(b1_ptr++) = *(b2_ptr++) ^ *(b3_ptr++);
    }

I did this in combination with changing the crossover for 10000x10000
from 3600 to 7200.

Bill.

On 17 May, 09:40, Martin Albrecht <[EMAIL PROTECTED]>
wrote:
> On Saturday 17 May 2008, Bill Hart wrote:
>
> > In going from 5000x5000 to 10000x10000 Magma's time increases by a
> > factor of less than 4. That is impossible. Strassen will never help us
> > there. They must be doing something else. Probably something clever.
>
> > Bill.
>
>  I was stuck there too yesterday. Maybe only at 10000x10000 the pipeline gets
> fully utilised?
>
> Martin
>
> PS: If we run out of idea we can simply go for parallelism, that should help
> on sage.math ;-)
>
> --
> name: Martin Albrecht
> _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> _www:http://www.informatik.uni-bremen.de/~malb
> _jab: [EMAIL PROTECTED]
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to