On 12/31/12 5:07 PM, Gilles Sadowski wrote: > Hi. > >>> If we stick to >>> >>> 0) algebraic objects are immutable >>> 1) algorithms defined using algebraic concepts should be implemented >>> using algebraic objects >>> >>> ... >>> 0) Start, with Konstantin's help, by fleshing out the InPlace >>> matrix / vector interface >>> 1) Integrate Mahout code as part of a wholesale refactoring of the >>> linear package > What do you mean by this? > Copy/paste or create a dependency? Something else?
I was thinking along the lines of copy / adapt; but Ted's response and some perusal of the code has disabused me of the idea that this would be feasible. I think realistically all we could do would be to borrow some ideas. > >>> 2) Extend use of the visitor pattern to perform mutations >>> "in-place" (similar to 0) in effect) >>> > As suggested in a previous post: > > 3) a) Define a new "minimal matrix" interface, and create immutable > implementations. > b) Benchmark critical methods (entry access, iteration, add, multiply, > etc.) > c) Quantify the efficiency gain of in-place operations and only when this > information is available decide whether the gain is worth the price. > [Even if in-place operations are faster in a single thread context, it > is not sure that immutability would not change that in a multi-thread > implementation. Trying to outperform multi-threaded code with in-place > operations is a dead end.] > > Before embarking on any of this, please identify the rationale: Is there > _one_ identified problem that would require urgent action? This discussion > about clean-up/improvement/simplification of the CM matrix implementations > has been going on for months, and we should not start a "new" discussion > without referring to what has been recorded by Sébastien on JIRA. Agreed we should keep the discussion concrete. Sebastien and Luc have both mentioned specific examples where the overhead of matrix data copy and storage creates practical problems. Konstantin mentioned another (Gaussian elimination) which is sort of humorous because we have in fact effectively implemented that already, embedded in the LU decomp class - but to do it, we took the approach that I mentioned above, which is to abandon the linear algebraic objects and operate directly on double arrays. You can see this throughout the linear package itself. This is partly an artifact of the fact that most of the impls there are translated from fortran or taken from JAMA, but partly it is just following 50+ yrs of numerical linear algebra experience, which favors in-place operations in iterative operations. An interesting exercise illustrating why you really need in-place operations for performance in these algorithms would be to see how far you could get implementing, say LU Decomp, using algebraic operations on immutable vectors and matrices only. The code might be nice to read; but I bet dollars to doughnuts performance would be terrible and it would require a lot more memory to handle the same size problems. I agree with you we need to keep things concrete and I also agree we should think about stripping down the RealMatrix interface; but I think we should pay attention to the practical concerns and approaches suggested by Ted, Konstantin and others. I would like to get some more concrete proposals on the table for either the InPlace interface or the use of views and visitors. Examples from other projects / languages / algorithms would be helpful here. The alternative is to agree to follow what is the de facto standard internally in the linear package now, which is essentially to use the algebraic objects just as value / data-transfer objects and work with (mutable) arrays when implementing iterative algorithms. I am actually OK with that approach as well; but to maximize code reuse, some algebraic operations will in this case have to be exposed as operations on primitive or FieldElement arrays. If you want to experiment with approaches for parallelizing some of our algorithms, I would say go for it. I doubt that mutability will really end up playing a significant role there. While it is in general true that immutable objects make thread-safety a little easier, it is not obvious to me that the real problems in parallelizing the algorithms are made any easier by making the data containers immutable (in other words, I would not expect protecting shards from cross-thread access to be a real issue). And getting maximum performance from each worker thread ends up facing the same problems. Phil > > > Regards, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org