----- "Ted Dunning" <ted.dunn...@gmail.com> a écrit : > I would like to add my voice as a Mahout committer. We would LOVE to > use > commons math in Mahout, but these and a few other issues prevent it. > > There was word some time ago about integrating a high performance > linear > package such as MTJ into math. Is that stalled?
If anybody is willing to do it, it is fine. I don't know if Sam is still around and willing to help. Luc > > On Tue, Oct 13, 2009 at 10:50 PM, Jake Mannix <jake.man...@gmail.com> > wrote: > > > Greetings, commons-math! > > > > I've been looking at a variety of apache/bsd-licensed linear > libraries for > > use in massively parallel machine-learning applications I've been > working > > on > > (I am housing my own open-source library at > > http://decomposer.googlecode.com, > > and am looking at integrating with/using/contributing to Apache > Mahout), > > and > > I'm wondering a little about the linear API there is here in > commons-math: > > > > * also for RealVector - No iterator methods? So if the > implementation is > > sparse, there's no way to just iterate over the non-zero entries? > What's > > worse, you can't even subclass OpenMapVector and expose the iterator > on the > > OpenIntToDoubleHashMap inner object, because it's private. :\ > > > > * for RealVector - what's with the million-different methods > mapXXX(), > > mapXXXtoSelf()? Why not just map(UnaryFunction()), and > > mapToSelf(UnaryFunction()), where UnaryFunction defines the single > method > > double apply(double d); ? Any user who wishes to implement > RealVector (to > > say, make a more efficient specialized SparseVector) has to go > through the > > pain of writing up a million methods dealing with these (and even > if > > copy/paste gets most of this, it still leads to some horribly huge > .java > > files filled with junk that does not appear to be used). There does > not > > even appear to be an AbstractRealVector which implements all of > these for > > you (by using the above-mentioned iterator() ). > > > > * while we're at it, if there is map(), why not also double > > RealVector.collect(Collector()), where Collector defines void > collect(int > > index, double value); and double result(); - this can be used for > generic > > inner products and kernels (and can allow for consolidating all of > the > > L1Norm(), norm(), and LInfNorm() methods into this same method, > passing in > > different L1NormCollector() etc... instances). > > > > * why all the methods which are overloaded to take either > RealVector or > > double[] (getDistance, dotProduct, add, etc...) - is there really > that much > > overhead in just implementing dotProduct(double[] d) as just > > dotProduct(new > > ArrayRealVector(d, false)); - no copy is done, nothing is done but > one > > object creation... > > > > * SparseVector is just a marker interface? Does it serve any > purpose? > > > > I guess I could ask similar questions on the Matrix interfaces, but > maybe > > those will probably be cleared up by understanding the philosophy > behind > > the > > Vector interfaces. > > > > I'd love to use commons-math for parts of my projects in which the > entire > > data sets can live in memory (often part of the computation falls > into this > > category, even if it's not the most meaty part, it's big enough that > I'll > > kill my performance if I am stuck writing my own subroutines for > eigen > > computation, etc for many moderately small matrices), but converting > two > > and > > from the commons-math linear interfaces seem a bit unweildy. Maybe > it > > would > > be easier if I could understand why these are the way they are. > > > > I'm happy to contribute patches consolidating interfaces and/or > extending > > functionality (you seem to be missing a compact int/double pair > > implementation of sparse vectors, for example, which are a > fantasticly > > performant format if they're immutable and only being used for dot > products > > and adding them to dense vectors), if it would be of help (I'm > tracking my > > attempts at this over on my GitHub clone of trunk: > > http://github.com/jakemannix/commons-math ). > > > > -jake mannix > > Principal Software Engineer > > Search and Recommender Systems > > LinkedIn.com > > > > > > -- > Ted Dunning, CTO > DeepDyve --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org