Le ven. 20 mars 2020 à 04:47, chentao...@qq.com <chentao...@qq.com> a écrit : > > Hi, > > >Hello. > > > >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <gillese...@gmail.com> a écrit > >: > >> > >> Hi. > >> > >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com <chentao...@qq.com>: > >> > Hi, > >> > I have created a PR to show my aim: > >> > https://github.com/apache/commons-math/pull/126/files > >> > >> Am I correct that the implementations of "ClustersPointExtractor" > >> modify the argument of the "extract" method? > >> If so, that seems quite unsafe. I would not expect this behaviour > >> in a public API. > > > >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable > >code): > > > >public class Extractor { > > T extract(List<List<T>> list); > >} > > I have read the exists code again, and I found "EmptyClusterStrategy" > and related logic only used in K-Means. > And the "Extractor" remove a point from exists Clusters is indeed unsuitable > to be a public API(as you mentioned before.) > I think another choice is simple make "EmptyClusterStrategy" and related > logic protected, > then it can be resuse by MiniBatchKMeans. > > Also after the "CentroidInitializer" has been move out, the > "KMeansPlusPlusClusterer" > should rename to "KMeansClusterer" with a construct parameter > "CentroidInitializer"
If you think that it makes sense to define a class hierarchy for "KMeans"-based algorithms, please do so. But note that "protected" is only slightly better than "public" (from a maintenance POV, it isn't better, since we'll be required to maintain compatibility all the same). Perhaps the base classe(s) can be package-private... > > > >with something along those lines: > > > >public class ClusterSet<T> { > > private Set<T> data; > > private List<List<T>> clusters; > > > > void remove(T e) { > > return data.remove(e); > > } > > > > public interface Visitor { > > T visit(List<List<T>> list); > > } > >} > > > >Key point is separation of concern (selection of element vs > >removal of element). > > I propose the ClusterSet should has more member like "distanceMeasure" > to store origin clustering parameters, > or to accelerate or support cluster finding or point finding. > ```java > public class ClusterSet<T> { > private List<Cluster> clusters; > private DistanceMeasure measure; > ... > public List<Cluster> closestTopN(Clusterable point); How about providing a comparator factory? public <T extends Clusterable> static Comparator<Cluster<T>> getComparator(final Clusterable p, final DistanceMeasure m) { return new Comparator<>() { public int compare(Cluster<T> c1, Cluster<T> c2) { final double d1 = c1.distance(p, m); // "distance" method to be added to the interface (?). final double d2 = c2.distance(p, m); return d1 < d2 ? -1 : d1 > d2 ? 1 : 0; } } } Then, if also providing an accessor to the clusters: public List<Cluster<T>> getClusters() { return Collections.umodifiableList(clusters); } users can do whatever they want, a.o. easily implement the ranking method which you suggest: /** @return clusters in descending order of proximity. */ public List<Cluster> closest(Clusterable p, ClusterSet<T> s, DistanceMeasure m) { final List<Cluster<T>> cList = new ArrayList<>(s.getClusters()); Collections.sort(cList, set.getComparator(p, m)); return cList; } > > public int pointSize(); > } > ``` A priori, I'd favour more encapsulation, in order to ensure that multi-thread access can be controlled, e.g. hide the "ClusterSet" data-structures, and provide access through identifiers that are atomically created when clusters are added to the instance, and so on... It would be great if you can come up with a concrete (and fully tested) implementation of "ClusterSet". ;-) Best, Gilles > > > >Of course the devil is in the details (maintain consistency among > >the fields of "ClusterSet", ensure that "Visitor" implementations > >cannot modify the internal state, ...). > > > >> Unless I missed some point, I'd ask again that the API be reviewed > >> *before* implementing several features (such as those "extractors") > >> on top of something that does not look right. > > > >One perspective might be to try and come with a design for use in > >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet"). > > > >Best regards, > >Gilles > > > >> >> [...] --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org