On Mon, Aug 15, 2011 at 4:04 AM, Gilles Sadowski < gil...@harfang.homelinux.org> wrote:
> Hello. > > > I'm in favor of moving some methods to the SparseXXX interface, but > > got voted down at the time. For application developers (like me), > > that can expect in advance if the Vector/Matrix is sparse or not it > > isn't a big deal. But I can see how it may cause problems for other > > libraries that want to leverage C-M. And actually, having problems > > seeing why it is a big deal in general. If I'm doing an operation > > like outer product, I would still prefer that the iterator skips the > > zero entries. > > I'm wondering whether, depending on the application field, one does not > know > in advance that one should use sparse implementations ("OpenMapRealVector" > and "OpenMapRealMatrix"). If those objects are used consistently throughout > the application, the operations should all leverage the sparseness > optimizations, thanks to polymorphism. > Polymorphism is really too limited here to get good results. And it really helps for a dense matrix to optimize operations against a sparse operand as in dense.plus(sparse). If all matrices implement isSparse and sparseIterator, then the general implementation of plus here can have one test and get almost all of the benefits of specially coded methods. Those tests are needed since it isn't reasonable to assume that the dense implementation knows about all types of matrices if users are allowed to implement matrices. Moreover, a simple single inheritance type structure is not rich enough to express the different kinds and conditions of a matrix. I would say that the argument should go the other way. Do you have a unit test demonstrating that having these methods in the general case *hurts* performance? It obviously helps. > Could there be specific unit tests demonstrating the necessity of having > something like "Iterator<Entry> sparseIterator()" in "RealVector"? The > drawback is obviously that in dense implementations, this method must be > implemented but is used only in those cases where the object should have > been "sparse" in the first place. This "drawback" is trivial. In the abstract implementation, you can just delegate to the dense iterator. Thus, RealVector never needs to know. > Unless I'm mistaken, it looks as if it > would sufficient to have "converters" between sparse and dense > implementations. > I think you are mistaken. Converters are a royal pain. Taking a case near to my heart, it is really, really important for gradient descent implementations of logistic regression to do smart things with sparse feature vectors. On the other hand, they also need to handle dense feature vectors. The internal coefficient matrix is, however, dense and there are half a dozen or more built-in kinds of sparse matrices. There are even several implementations of dense matrices. Almost all of the operations are actually on the internal coefficient matrix. If the basic operations on a dense matrix all handle sparse cases correctly, then the learning code can declare the feature vector to be a Vector. If not, I have to duplicate (at least) all of the learning simply for the benefit of the compiler.