On Fri, Aug 26, 2011 at 2:50 AM, Luc Maisonobe <luc.maison...@free.fr>wrote:

> Thanks, Ted.  That does look very flexible and approachable too.  ...
>>
>
> I like the view approach, but wonder how it scales ... down for small data.
>


I doubt that it does scale all the way down.  But then again, I doubt that
it matters.  After all, there are only about 26 possible views of a 3x3
matrix and about half of those are completely trivial (i.e. single
elements).  Manipulation of tiny matrices is likely to be done directly.

In any case, performance with very small matrices is going to depend on
having specialized cases as is pretty much always the case.

For mid-range matrices from 5x5 to 100x100, the view style works really
well.  Moreover, since matrices in this range are typically dense, views do
not require a wrapper class, but can just be a specially strided matrix that
shares storage with the original.  That means that views are just as fast as
any kind of matrix.


> If you remember Yannick's concerns, the problems he addresses (and the one
> I address too) are millions of computations on tiny matrices and vectors
> (3x3, 6x6) rather than few operations on very large data sets (say
> decomposing a 50000x50000 matrix). I would like Apache Commons Math to
> address both cases. For now, I think we are quite bad on large system, and
> especially on sparse systems. So we need to improve, but still be good for
> small systems.
>

Sounds like a good goal.  And I think if you have specially coded
RealSmallMatrix.times(RealSmallMatrix) methods, that you will get what you
need.


> Could we basically copy Mahout code ? Ted, what would you think about this
> ?


I think that we still have some major warts in the Mahout code that you
would do better to not copy.  Better to do a philosophical copy while cherry
picking the parts of Mahout that actually fit the CM goals and quality
requirements.  As a new project, Mahout prioritizes forward progress and
simple interfaces over total code hygiene or maximal performance.  Commons
Math has a higher priority on not screwing the user base with changes.
 These philosophical differences are going to make a lot of Mahout code
unsatisfactory for CM.

I also suspect (but don't know) that the basic dense matrix operations in CM
may be somewhat higher quality than in Mahout.  Adding additional function
types in the spirit of visitors for the assign operation would be a good
idea.  The sparse matrices and vectors are pretty good in Mahout but they
would require a new dependency on Mahout Collections (or a snarf of that
entire project).

The AbstractMatrix class combined with the MatrixTestCase makes it pretty
easy to make up a new matrix (see RandomMatrix, ConstantMatrix and
DiagonalMatrix for example).  I do think that several of the tests in
MatrixTestCase are silly and should be made better, but it still catches
most of the gotchas when defining a new matrix type.  Defining a new
specialized matrix typically takes me under an hour although the recent
PivotedMatrix took a bit longer because I was stupid at first.

Mahout also has a large bolus of undigested code from the Colt project that
we are slowly assimilating.  This code is all marked as deprecated until we
bring it up to current standards and write test cases for it.  I would avoid
bringing that stuff into CM.

Reply via email to