On Wed, Dec 9, 2009 at 1:17 PM, Benson Margulies <bimargul...@gmail.com>wrote:

>
> CERN Colt is a library with a mixture of 'category A' material and
> 'category B-or-worse' material. In other words, it is not an
> attractive dependency for ASF code as a lump.
>

Yep, that's why when I made my original patch of Colt for Mahout, I
stripped out all dependency on the 'category B-or-worse' material, and
tried to keep everything else other than the physics specific stuff that
nobody was going to need.


> Part of the 'category A' code is a set of very pretty associative
> containers for primitive types. My goal is to take this code and
> modernize it to use Generics as appropriate -- and otherwise make it a
> replacement for the LGPL Trove library.
>

I believe Mahout kept all this: look in o.a.m.matrix.list and
o.a.m.matrix.map
The package name they're both in (o.a.m.matrix) is bad - it used to be
"colt",
but that name is trademarked by CERN, so even if the code is freely
licensable, the name isn't.  It might be better called "collections" or
something.

The associative code uses some math code. I'm pretty sure that all of
> the math code that it uses is also category A. In fact, I believe that
> all of it is in the portion that Mahout forked. This includes some
> things that aren't in commons-math at all.
>

Oh, this is what you're saying here, ok.


> So, we have some adoptable code in Colt that overlaps functionality in
> -math, some that doesn't, and some that I want to use as the basis for
> work in -primitives.
>

The stuff you want in primitives is also stuff we use/want to use for
our linear work in Mahout, by the way (OpenDoubleIntHashMap,
DoubleArrayList, etc...).


> If the messages in this thread mean that the code already in -math is
> in fact moving in a direction to support mahout, then it might be
> acceptable for the additional, non-overlapping, Colt math code to end
> up in -math, and then everyone ends up happy?
>

None of the patches mentioned above contain Colt code, so no, c-math
is not moving in a direction to support what mahout needs, not really,
because of the backwards compatibility constraints and the fact that
it's hard to tell what is the "non-overlapping" part of Colt math: for
example: vectors in Colt provide complex aggregation methods, but
c-math vectors do not (as an interface).  Is this non-overlapping
code?  If it is, then how would you suppose this would be incorporated
in c-math?  Add a new interface, the "ColtVector", which is basically
the same as c-math's "RealVector", but with some methods removed
and some others added?

This doesn't really make sense.  Implementations are one thing, but
interfaces which consumers see is another.

Another question: OK, -math is a stable, high-compatibility library.
> Mahout, on the other hand, wants to be a fast-moving, somewhat fluid,
> build of code (naturally including some math code) adapted for
> map-reduce. So, how about a branch of math that could release
> frequently with a new major version number?


Well, actually, a more feasible version of this, instead of branches, would
be to use the current "experimental" source submodule in commons-math.
If that could be a faster moving fluid build which could include colt math
etc, that would be great, and Mahout could possibly use this.

This brings up another point though: if commons-primitives is what you're
creating, and want to base it on colt, well, we in Mahout could probably
handle
depending on you, but colt math linear depends on these, so can
commons-math depend on commons-primitives, or does that violate
c-math's "no external dependencies" rule?  This would keep c-math from
adopting colt math, because those would need the colt-primitives stuff.

  -jake

Reply via email to