Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Ted Dunning Mon, 15 Aug 2011 08:03:20 -0700

On Mon, Aug 15, 2011 at 4:04 AM, Gilles Sadowski <
gil...@harfang.homelinux.org> wrote:


> Hello.
>
> > I'm in favor of moving some methods to the SparseXXX interface, but
> > got voted down at the time. For application developers (like me),
> > that can expect in advance if the Vector/Matrix is sparse or not it
> > isn't a big deal. But I can see how it may cause problems for other
> > libraries that want to leverage C-M.  And actually, having problems
> > seeing why it is a big deal in general.  If I'm doing an operation
> > like outer product, I would still prefer that the iterator skips the
> > zero entries.
>
> I'm wondering whether, depending on the application field, one does not
> know
> in advance that one should use sparse implementations ("OpenMapRealVector"
> and "OpenMapRealMatrix"). If those objects are used consistently throughout
> the application, the operations should all leverage the sparseness
> optimizations, thanks to polymorphism.
>

Polymorphism is really too limited here to get good results.

And it really helps for a dense matrix to optimize operations against a
sparse operand as in dense.plus(sparse).  If all matrices implement isSparse
and sparseIterator, then the general implementation of plus here can have
one test and get almost all of the benefits of specially coded methods.
 Those tests are needed since it isn't reasonable to assume that the dense
implementation knows about all types of matrices if users are allowed to
implement matrices.  Moreover, a simple single inheritance type structure is
not rich enough to express the different kinds and conditions of a matrix.

I would say that the argument should go the other way.  Do you have a unit
test demonstrating that having these methods in the general case *hurts*
performance?  It obviously helps.



> Could there be specific unit tests demonstrating the necessity of having
> something like "Iterator<Entry> sparseIterator()" in "RealVector"? The
> drawback is obviously that in dense implementations, this method must be
> implemented but is used only in those cases where the object should have
> been "sparse" in the first place.


This "drawback" is trivial.  In the abstract implementation, you can just
delegate to the dense iterator.  Thus, RealVector never needs to know.



> Unless I'm mistaken, it looks as if it
> would sufficient to have "converters" between sparse and dense
> implementations.
>

I think you are mistaken.  Converters are a royal pain.

Taking a case near to my heart, it is really, really important for gradient
descent implementations of logistic regression to do smart things with
sparse feature vectors.  On the other hand, they also need to handle dense
feature vectors.  The internal coefficient matrix is, however, dense and
there are half a dozen or more built-in kinds of sparse matrices.  There are
even several implementations of dense matrices.

Almost all of the operations are actually on the internal coefficient
matrix.  If the basic operations on a dense matrix all handle sparse cases
correctly, then the learning code can declare the feature vector to be a
Vector.  If not, I have to duplicate (at least) all of the learning simply
for the benefit of the compiler.

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Reply via email to