Hi Luc,
On Wed, Oct 14, 2009 at 3:01 AM, <luc.maison...@free.fr> wrote: > > > > * also for RealVector - No iterator methods? So if the > > implementation is > > sparse, there's no way to just iterate over the non-zero entries? > > What's > > worse, you can't even subclass OpenMapVector and expose the iterator > > on the > > OpenIntToDoubleHashMap inner object, because it's private. :\ > > Good idea. You can use JIRA <https://issues.apache.org/jira/browse/MATH> > to register a request for implementing this. Patches are of course welcome. > There should probably be two iterators: one for all entries and one for the > non-default entries (which may be non-zeroes or non-NaN or anything else). > I'll open up a ticket and attach a patch (with tests, naturally) later today. > This API is set up the way I get it from an external contributor, so I > guess he had a use case for that. I extended it to remain in the same spirit > and get this huge mess. I'm sorry for that. I agree a more generic method > would be interesting. Removing these methods would however introduce an > incompatible API change, so this could be done only in a major release (i.e. > 3.0) which is probably a long time from now. > Yeah, this is why I'm sad I missed the refactoring push to hit 2.0. For now, however, a lot of implementation pain could get avoided with the iterator() and iterateNonDefault(), together with a single AbstractRealVector which has a default implementation of all of these crazy methods, for implementations which don't need to think about them. > The generic method should also either be provided in two versions (all > entries and non-default entries) or it should have an iterator argument. For > example the cosine and exponential functions transform a zero entry into a > non-zero entry so they cannot ignore zero entries. > > > > > * while we're at it, if there is map(), why not also double > > RealVector.collect(Collector()), where Collector defines void > > collect(int > > index, double value); and double result(); - this can be used for > > generic > > inner products and kernels (and can allow for consolidating all of > > the > > L1Norm(), norm(), and LInfNorm() methods into this same method, > > passing in > > different L1NormCollector() etc... instances). > > Godd idea too. Another JIRA ticket for that ? > JIRA ticket, tests, patch on the way. Maybe today, we'll see. :) > > > > > * why all the methods which are overloaded to take either RealVector > > or > > double[] (getDistance, dotProduct, add, etc...) - is there really that > > much > > overhead in just implementing dotProduct(double[] d) as just > > dotProduct(new > > ArrayRealVector(d, false)); - no copy is done, nothing is done but > > one > > object creation... > > It's not the copy that could take time, but the iteration which needs to > call getEntry(). So yes, there is some overhead and it can be avoided by > providing the simple array version. Of course, a default implementation that > wraps the array into an ArrayRealVector can be added to the > AbstractRealVector class you proposed above, in order to simplify new > implementations. > This depends on whether the implementation details: ArrayRealVector.dotProduct when passed another instance of ArrayRealVector, they have access to each others internals, and can avoid this getEntry() call altogether. Other subclasses can have similar speedup strategies. I can try and whip up a patch and some perf tests to check speed of these operations to verify - another JIRA ticket, I think? :) > > * SparseVector is just a marker interface? Does it serve any > > purpose? > > For now, yes it is a marker interface. There was some discussion about > these interfaces just before the release of 2.0. the conclusion was that > they should remain semple markers at that time. > Fair enough. > The idea was really that people could provide their own implementations. > Some methods that are close in spirit to the iterators you ask for are in > the matrix interfaces (the walkXxx methods) and are used in many algorithms > inside [math]. > Ok great, I'll try to play around with those. > If you intend to contribute them to [math], you'll have to put them on JIRA > and send a Software Grant <http://www.apache.org/licenses/#grants> to > Apache secretary. If you develop contributions directly for [math] (i.e. if > it is not preexisting software), then rather than a Software Grant we will > need either a Contributor License Agreement (CLA), either an Individual CLA > or a Corporate CLA <http://www.apache.org/licenses/#clas>. > Yeah, I'm down with the "apache way", I'll attach patches to the JIRA tickets after clicking the lovely "you can have this" button. None of the stuff I'm talking about contributing is a "large body of code" which needs a special grant (I'm sending Mahout a bunch of stuff which may need that, although I'm the only contributor to the project I'm donating, so I'm not sure the need even in that case). -jake