On 03/26/2013 08:07 PM, Thomas Neidhart wrote: > On 03/26/2013 01:38 PM, Gilles wrote: >>>> [...] >>>>> >>>>>> There is at least one contribution requiring some feedback (MATH-917). >>>>> >>>>> I have added a first patch and would love to get some feedback. >>>> >>>> It seems we are really close to releasing now. I guess MATH-817 will be >>>> resolved soon. What seems to be achievable in a short time frame >>>> would be: >>>> >>>> - wrapping up MATH-917 as Thomas proposed, >>>> - finishing MATH-437 (Kolmogorov-Smirnov), and perhaps also >>>> MATH-228 at the same time >>>> - removing cobertura from parent >>>> >>>> I would propose to postpone MATH-873, MATH-923 and MATH-874 to 4.0. >>>> >>>> What do you think? >>> >>> Looks good. I will finish my work on MATH-917 till Wednesday the latest. >> >> I wonder whether "Clusterable" should be (the generic type seems >> superfluous): > > you are right, I did remove the generic type already in my local > environment. > >> public interface Clusterable { >> double[] getClusteringData(); >> } >> >> And the "distance" interface simplified to: >> >> public interface Distance { >> double compute(double[] a, double[] b); >> } >> >> The concept being more general than just for clustering, the interface and >> its implementations probably belong elsewhere, e.g. in "o.a.c.m.util" (or >> perhaps even better as static inner classes of "MathArrays"). > > I thought about this too, but the downside would be that any Clusterable > must have a double[] internally. Having dimension() and component(int) > methods is more flexible imho, but both approaches have their benefits. > >> "AbstractClusterer" should be immutable (i.e. no distance setter). >> >> If using interfaces, the "getDistanceMeasure" method should probably appear >> in one of them. Maybe: >> >> public interface Clusterer<T extends Clusterable> { >> List<? extends Cluster<T>> cluster(Collection<T> points); >> Distance getDistance(); >> } >> >> However, in line with what we've been doing the last 3 or 4 years, >> "Clusterer" could just be an abstract class. > > I was musing about making the Clusterer implementations immutable, or to > create setters for their necessary parameters, e.g. the k parameter for > k-means. > > Otherwise you have to create a new object each time you want to change > something, would this be acceptable? > > Another thing that I changed locally: the KMeans clusterer currently > supports multi-evaluation with a separate method. This could be moved to > a separate class MultiKMeansPlusPlusClusterer, which allows to use it > with the same Clusterer interface.
I have updated the patch, and also taken your suggestions into account. Right now the Clusterable interface is untouched, but the DistanceMeasure interface now accepts double[] as input. The AbstractClusterer provides a convenience method to calculate the distance of two Clusterable instances by converting them first to arrays and then calling DistanceMeasure#compute. I think it is a good compromise, but it would also be easy to change the Clusterable interface to just return a double[]. Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org