My apologies, but I have totally lost track of who said what because too many comments have enormous lines immediately adjacent to responses.
On Tue, Jan 1, 2013 at 11:39 AM, Somebody <s...@body.org> wrote: > I thought that maybe it was due to the underlying > (dynamic) data structure for sparse vectors/matrices > (OpenIntToDoubleHashMap), while typical storage schemes (compressed row, > skyline) might be more efficient (since only arrays are used), but do not > lend themselves to changes of the value of initially null coefficients. > That's why I was suggesting immutable matrices as a start, but what I > really meant was matrices where the null coefficients are constant. > > Please note that your objection does not hold with iterative linear solvers > (conjugate gradient, ...), so immutable matrices might still be > interesting. > One of the benefits of making it easy to extend matrices can be found in Mahout (which I should emphasize is probably only a source of ideas, not a perfect paragon of perfection by any means). As a result of easy extensibility, we have in mahout two kinds of sparse vectors and many kinds of sparse matrices. One kind of sparse vector uses an open hash table (RandomAccessHashTable). It works well for mutable situations, but is a bit more memory hungry than might be liked. The other kind is implemented using an array of indexes and an array of doubles. It can be read randomly with log n worst case performance or log log n performance if the indices are well distributed. It is nearly immutable in that after a sequence of mutations, it requires substantial time and possibly memory to merge itself back together for more read operations. What it does phenomenally well, however, is support iteration with little memory overhead. As far as matrices are concerned, we have a dense matrix backed by a single indexed double[]. We have a row sparse matrix that has rows that are either kind of sparse vectors. We have block sparse matrices that has row patches that are matrices which need not exist, but are created lazily if written to. We have memory mapped dense matrices. We have a memory mapped dense matrix that maps several regions of large files together to form a single matrix (since mmap only supports 2GB files). Some of these storage forms are in Mahout math. Some are in applications. The virtue here is that we didn't have to discuss these matters much. We could just say "Sounds like a great idea, can you build a prototype to demonstrate your point?" to any bright spark who came along. Many prototypes were built and several were incorporated. So the impact of a simple core API (with linear algebra, mutable operations and OK, but not great visitor patterns) and associated abstract classes and abstract tests was as much social as technical. The social virtues have, in turn, led to technical virtues.