[sage-devel] Re: Re: [sage-support] performance of matrix_from_columns method ...

Martin Albrecht Mon, 30 Jan 2012 00:42:20 -0800

On Monday 30 January 2012, you wrote:
> Dear Martin,

Hi Benjamin, (CC [sage-devel] where I think this discussion belongs),
 
> I do know C - and I've dowloaded the M4RI library.  I was considering
> writing up
> some C code (and use ctypes to tie back to python/sage)
> which would wrap M4RI more directly (so I that I could then use
> the M4RI bit packed matrix format, and some of the functionality of
> M4RI as well).  But this would take me a while.


Interesting, what do you envisage as the advantage of using ctypes over 
Cython? As far as I know Cython should give you at least the same performance 
and is as easy.

> As for how soon do I need a faster matrix_from_columns -- the sooner the
> better, and I'm willing to make the changes if you could tell me what to
> do?

1) A first step would be to drop your My_matrix_from_columns in 
sage/sage/matrix/matrix_mod2_dense.pyx and recompile ("sage -b") this would 
already make things faster.

2) A second step depends on how far you want to go:

2.1) Work in the M4RI library. You'd implement a new function in mzd.c 
(formerly packedmatrix.c) like mzd_from_columns() which takes an mzd_t and a 
mzp_t which is an array of column/row indices. I'd write this function a bit 
like this (where A is the matrix and L the array). We write to B.

for(rci_t i=0; i<L->length; i++)
   words[i] = (A->offset + L->values[i]) / m4ri_radix;
   offsets[i] = (A->offset + L->values[i] % m4ri_radix;

for(rci_t i=0; i<A->nrows; i++)
   word *src =  A->rows[i];
   word *dst =  B->rows[i];
   for (rci_t j=0; j<L->length; j+=m4ri_radix)
     dst[words[j+0]] |= (src[words[j+0]] >> offsets[j+0]) << 0;
     dst[words[j+1]] |= (src[words[j+1]] >> offsets[j+1]) << 1;

and so on + some code to deal with the rest. That's the fastest I could come 
up with so far (and the code above probably has bugs etc.) 
.
If you want to go down this road, I'd appreciate if you could work with the 
bitbucket version of M4RI for which you need a patch for Sage.

2.2) If that's too much of a hassle, you can write pretty much the same 
function in Cython (replacing some -> by "." and so on) in 

  sage/sage/matrix/matrix_mod2_dense.pyx

I'd then take care of the code arriving in the M4RI library. 


2.3) You can probably also write a much simpler function like this which still 
would be much faster than what we have now.

  for(rci_t i=0; i<A->nrows; i++)
    for(rci_t j=0; j<L->length; j++)
       mzd_write_bit(B,i,j, mzd_read_bit(A,i,L->values[j]));

> Thanks very much for the work you've done
> incorporating M4RI(E) into sage!
> And, also thanks for the recent post you made to the blog.
> As I understand it, the change applies to GF(2^c) for
> c = 2..8 or so (but for c=1, M4RI is already implemented).

The new code covers c=2..10 and I plan to support up to 2..16 but I need to 
write some boring code before that's the case. Note that the M4RI algorithm is 
not implemented for c>1, although it might make sense for very small c, say 
2-3. What drives M4RIE is Karatsuba-based matrix-matrix multiplication and 
something we call Newton-John tables (which are inspired by M4RM tables). Btw. 
do you have an application for M4RIE? Right now, it's a bit like a solution 
which lacks a problem :)

> It is nice to see sage outperform Magma (at least, until the Magma guys
> make note of it and take the M4RI algorithm for themselves, and I suppose
> there's nothing keeping them from doing that...)

Actually, M4RM multiplication is implemented in Magma for GF(2). Essentially, 
we are doing the same thing for matrix-matrix multiplication and we are about 
equally efficient. Allan didn't implement M4RI as far as I can tell, nor our 
variant of it which is why we are faster than Magma for Elimination. However, 
I'd be very excited if Allan could make elimination considerably faster than 
what we have in M4RI (which is *not* the M4RI algorithm but PLE decomposition 
where the base case is inspired by it). For c=2..4 (and soon 2..8) Allan also 
does Karatsuba multiplication. I don't think he uses that Newton-John table 
trick, but it might be so obvious that he didn't care to mention when I asked. 
Again, for c=2..4 we are currently faster than Magma and I'd be surprised if 
Allan could make his code much faster than what we have at least for matrix-
matrix multiplication.

> In the long run, I thought that for future sage releases (for matrices all
> matrices, not just those over small fields) that it would be nice to have a
> speedier version of those three submatrix methods.  It think they are
> currently written in python, rather than in precompiled versions as they
> probably should be...

They are compiled but use generic data structures only. They are also very 
cache unfriendly (column index in outer loop, then row index). We could 
probably gain something by simply swapping those.

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinralbre...@jabber.ccc.de

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: Re: [sage-support] performance of matrix_from_columns method ...

Reply via email to