On 12/31/12 5:07 PM, Gilles Sadowski wrote:
> Hi.
>
>>> If we stick to
>>>
>>> 0) algebraic objects are immutable
>>> 1) algorithms defined using algebraic concepts should be implemented
>>> using algebraic objects
>>>
>>> ...
>>> 0)  Start, with Konstantin's help, by fleshing out the InPlace
>>> matrix / vector interface
>>> 1)  Integrate Mahout code as part of a wholesale refactoring of the
>>> linear package
> What do you mean by this?
> Copy/paste or create a dependency? Something else?

I was thinking along the lines of copy / adapt; but Ted's response
and some perusal of the code has disabused me of the idea that this
would be feasible.  I think realistically all we could do would be
to borrow some ideas.
>
>>> 2)  Extend use of the visitor pattern to perform mutations
>>> "in-place" (similar to 0) in effect)
>>>
> As suggested in a previous post:
>
> 3) a) Define a new "minimal matrix" interface, and create immutable
>       implementations.
>    b) Benchmark critical methods (entry access, iteration, add, multiply,
>       etc.)
>    c) Quantify the efficiency gain of in-place operations and only when this
>       information is available decide whether the gain is worth the price.
>       [Even if in-place operations are faster in a single thread context, it
>       is not sure that immutability would not change that in a multi-thread
>       implementation. Trying to outperform multi-threaded code with in-place
>       operations is a dead end.]
>
> Before embarking on any of this, please identify the rationale: Is there
> _one_ identified problem that would require urgent action? This discussion
> about clean-up/improvement/simplification of the CM matrix implementations
> has been going on for months, and we should not start a "new" discussion
> without referring to what has been recorded by Sébastien on JIRA.

Agreed we should keep the discussion concrete.  Sebastien and Luc
have both mentioned specific examples where the overhead of matrix
data copy and storage creates practical problems.  Konstantin
mentioned another (Gaussian elimination) which is sort of humorous
because we have in fact effectively implemented that already,
embedded in the LU decomp class - but to do it, we took the approach
that I mentioned above, which is to abandon the linear algebraic
objects and operate directly on double arrays.   You can see this
throughout the linear package itself.  This is partly an artifact of
the fact that most of the impls there are translated from fortran or
taken from JAMA, but partly it is just following 50+ yrs of
numerical linear algebra experience, which favors in-place
operations in iterative operations.  An interesting exercise
illustrating why you really need in-place operations for performance
in these algorithms would be to see how far you could get
implementing, say LU Decomp, using algebraic operations on immutable
vectors and matrices only.  The code might be nice to read; but I
bet dollars to doughnuts performance would be terrible and it would
require a lot more memory to handle the same size problems.

I agree with you we need to keep things concrete and I also agree we
should think about stripping down the RealMatrix interface; but I
think we should pay attention to the practical concerns and
approaches suggested by Ted, Konstantin and others.  I would like to
get some more concrete proposals on the table for either the InPlace
interface or the use of views and visitors.  Examples from other
projects / languages / algorithms would be helpful here.  The
alternative is to agree to follow what is the de facto standard
internally in the linear package now, which is essentially to use
the algebraic objects just as value / data-transfer objects and work
with (mutable) arrays when implementing iterative algorithms.  I am
actually OK with that approach as well; but to maximize code reuse,
some algebraic operations will in this case have to be exposed as
operations on primitive or FieldElement arrays.

If you want to experiment with approaches for parallelizing some of
our algorithms, I would say go for it.  I doubt that mutability will
really end up playing a significant role there.  While it is in
general true that immutable objects make thread-safety a little
easier, it is not obvious to me that the real problems in
parallelizing the algorithms are made any easier by making the data
containers immutable (in other words, I would not expect protecting
shards from cross-thread access to be a real issue).  And getting
maximum performance from each worker thread ends up facing the same
problems.

Phil


>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to