I would like to add my voice as a Mahout committer.  We would LOVE to use
commons math in Mahout, but these and a few other issues prevent it.

There was word some time ago about integrating a high performance linear
package such as MTJ into math.  Is that stalled?

On Tue, Oct 13, 2009 at 10:50 PM, Jake Mannix <jake.man...@gmail.com> wrote:

> Greetings, commons-math!
>
>  I've been looking at a variety of apache/bsd-licensed linear libraries for
> use in massively parallel machine-learning applications I've been working
> on
> (I am housing my own open-source library at
> http://decomposer.googlecode.com,
> and am looking at integrating with/using/contributing to Apache Mahout),
> and
> I'm wondering a little about the linear API there is here in commons-math:
>
>  * also for RealVector - No iterator methods?  So if the implementation is
> sparse, there's no way to just iterate over the non-zero entries?  What's
> worse, you can't even subclass OpenMapVector and expose the iterator on the
> OpenIntToDoubleHashMap inner object, because it's private. :\
>
>  * for RealVector - what's with the million-different methods mapXXX(),
> mapXXXtoSelf()?  Why not just map(UnaryFunction()), and
> mapToSelf(UnaryFunction()), where UnaryFunction defines the single method
> double apply(double d); ?  Any user who wishes to implement RealVector (to
> say, make a more efficient specialized SparseVector) has to go through the
> pain of writing up a million methods dealing with these (and even if
> copy/paste gets most of this,  it still leads to some horribly huge .java
> files filled with junk that does not appear to be used).  There does not
> even appear to be an AbstractRealVector which implements all of these for
> you (by using the above-mentioned iterator() ).
>
>  * while we're at it, if there is map(), why not also double
> RealVector.collect(Collector()), where Collector defines void collect(int
> index, double value); and double result(); - this can be used for generic
> inner products and kernels (and can allow for consolidating all of the
> L1Norm(), norm(), and LInfNorm() methods into this same method, passing in
> different L1NormCollector() etc... instances).
>
>  * why all the methods which are overloaded to take either RealVector or
> double[] (getDistance, dotProduct, add, etc...) - is there really that much
> overhead in just implementing dotProduct(double[] d)  as just
> dotProduct(new
> ArrayRealVector(d, false)); - no copy is done, nothing is done but one
> object creation...
>
>  * SparseVector is just a marker interface?  Does it serve any purpose?
>
> I guess I could ask similar questions on the Matrix interfaces, but maybe
> those will probably be cleared up by understanding the philosophy behind
> the
> Vector interfaces.
>
> I'd love to use commons-math for parts of my projects in which the entire
> data sets can live in memory (often part of the computation falls into this
> category, even if it's not the most meaty part, it's big enough that I'll
> kill my performance if I am stuck writing my own subroutines for eigen
> computation, etc for many moderately small matrices), but converting two
> and
> from the commons-math linear interfaces seem a bit unweildy.  Maybe it
> would
> be easier if I could understand why these are the way they are.
>
> I'm happy to contribute patches consolidating interfaces and/or extending
> functionality (you seem to be missing a compact int/double pair
> implementation of sparse vectors, for example, which are a fantasticly
> performant format if they're immutable and only being used for dot products
> and adding them to dense vectors), if it would be of help (I'm tracking my
> attempts at this over on my GitHub clone of trunk:
> http://github.com/jakemannix/commons-math ).
>
>  -jake mannix
>  Principal Software Engineer
>  Search and Recommender Systems
>  LinkedIn.com
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to