On Wed, Dec 9, 2009 at 1:17 PM, Benson Margulies <bimargul...@gmail.com>wrote:
> > CERN Colt is a library with a mixture of 'category A' material and > 'category B-or-worse' material. In other words, it is not an > attractive dependency for ASF code as a lump. > Yep, that's why when I made my original patch of Colt for Mahout, I stripped out all dependency on the 'category B-or-worse' material, and tried to keep everything else other than the physics specific stuff that nobody was going to need. > Part of the 'category A' code is a set of very pretty associative > containers for primitive types. My goal is to take this code and > modernize it to use Generics as appropriate -- and otherwise make it a > replacement for the LGPL Trove library. > I believe Mahout kept all this: look in o.a.m.matrix.list and o.a.m.matrix.map The package name they're both in (o.a.m.matrix) is bad - it used to be "colt", but that name is trademarked by CERN, so even if the code is freely licensable, the name isn't. It might be better called "collections" or something. The associative code uses some math code. I'm pretty sure that all of > the math code that it uses is also category A. In fact, I believe that > all of it is in the portion that Mahout forked. This includes some > things that aren't in commons-math at all. > Oh, this is what you're saying here, ok. > So, we have some adoptable code in Colt that overlaps functionality in > -math, some that doesn't, and some that I want to use as the basis for > work in -primitives. > The stuff you want in primitives is also stuff we use/want to use for our linear work in Mahout, by the way (OpenDoubleIntHashMap, DoubleArrayList, etc...). > If the messages in this thread mean that the code already in -math is > in fact moving in a direction to support mahout, then it might be > acceptable for the additional, non-overlapping, Colt math code to end > up in -math, and then everyone ends up happy? > None of the patches mentioned above contain Colt code, so no, c-math is not moving in a direction to support what mahout needs, not really, because of the backwards compatibility constraints and the fact that it's hard to tell what is the "non-overlapping" part of Colt math: for example: vectors in Colt provide complex aggregation methods, but c-math vectors do not (as an interface). Is this non-overlapping code? If it is, then how would you suppose this would be incorporated in c-math? Add a new interface, the "ColtVector", which is basically the same as c-math's "RealVector", but with some methods removed and some others added? This doesn't really make sense. Implementations are one thing, but interfaces which consumers see is another. Another question: OK, -math is a stable, high-compatibility library. > Mahout, on the other hand, wants to be a fast-moving, somewhat fluid, > build of code (naturally including some math code) adapted for > map-reduce. So, how about a branch of math that could release > frequently with a new major version number? Well, actually, a more feasible version of this, instead of branches, would be to use the current "experimental" source submodule in commons-math. If that could be a faster moving fluid build which could include colt math etc, that would be great, and Mahout could possibly use this. This brings up another point though: if commons-primitives is what you're creating, and want to base it on colt, well, we in Mahout could probably handle depending on you, but colt math linear depends on these, so can commons-math depend on commons-primitives, or does that violate c-math's "no external dependencies" rule? This would keep c-math from adopting colt math, because those would need the colt-primitives stuff. -jake