Hi, >Le ven. 20 mars 2020 à 04:47, chentao...@qq.com <chentao...@qq.com> a écrit : >> >> Hi, >> >> >Hello. >> > >> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <gillese...@gmail.com> a >> >écrit : >> >> >> >> Hi. >> >> >> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com <chentao...@qq.com>: >> >> > Hi, >> >> > I have created a PR to show my aim: >> >> > https://github.com/apache/commons-math/pull/126/files >> >> >> >> Am I correct that the implementations of "ClustersPointExtractor" >> >> modify the argument of the "extract" method? >> >> If so, that seems quite unsafe. I would not expect this behaviour >> >> in a public API. >> > >> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable >> >code): >> > >> >public class Extractor { >> > T extract(List<List<T>> list); >> >} >> >> I have read the exists code again, and I found "EmptyClusterStrategy" >> and related logic only used in K-Means. >> And the "Extractor" remove a point from exists Clusters is indeed unsuitable >> to be a public API(as you mentioned before.) >> I think another choice is simple make "EmptyClusterStrategy" and related >> logic protected, >> then it can be resuse by MiniBatchKMeans. >> >> Also after the "CentroidInitializer" has been move out, the >> "KMeansPlusPlusClusterer" >> should rename to "KMeansClusterer" with a construct parameter >>"CentroidInitializer" > >If you think that it makes sense to define a class hierarchy for >"KMeans"-based algorithms, please do so. But note that "protected" >is only slightly better than "public" (from a maintenance POV, it isn't >better, since we'll be required to maintain compatibility all the same). >Perhaps the base classe(s) can be package-private...
OK, package-private on "EmptyClusterStrategy" and related logic is fine. > >> > >> >with something along those lines: >> > >> >public class ClusterSet<T> { >> > private Set<T> data; >> > private List<List<T>> clusters; >> > >> > void remove(T e) { >> > return data.remove(e); >> > } >> > >> > public interface Visitor { >> > T visit(List<List<T>> list); >> > } >> >} >> > >> >Key point is separation of concern (selection of element vs >> >removal of element). >> >> I propose the ClusterSet should has more member like "distanceMeasure" >> to store origin clustering parameters, >> or to accelerate or support cluster finding or point finding. >> ```java >> public class ClusterSet<T> { >> private List<Cluster> clusters; >> private DistanceMeasure measure; >> ... >> public List<Cluster> closestTopN(Clusterable point); > >How about providing a comparator factory? > >public <T extends Clusterable> static > Comparator<Cluster<T>> getComparator(final Clusterable p, final >DistanceMeasure m) { > return new Comparator<>() { > public int compare(Cluster<T> c1, Cluster<T> c2) { > final double d1 = c1.distance(p, m); // "distance" method >to be added to the interface (?). > final double d2 = c2.distance(p, m); > return d1 < d2 ? -1 : d1 > d2 ? 1 : 0; > } > } >} How do you think about the "distanceMeasure" to be a memeber of "ClusterSet", In my opinion "ClusterSet" should contains all the parameters and result for clustering and clusters. > >Then, if also providing an accessor to the clusters: > >public List<Cluster<T>> getClusters() { > return Collections.umodifiableList(clusters); >} > >users can do whatever they want, a.o. easily implement the >ranking method which you suggest: > >/** @return clusters in descending order of proximity. */ >public List<Cluster> closest(Clusterable p, ClusterSet<T> s, >DistanceMeasure m) { > final List<Cluster<T>> cList = new ArrayList<>(s.getClusters()); > Collections.sort(cList, set.getComparator(p, m)); > return cList; >} > >> >> public int pointSize(); >> } >> ``` > >A priori, I'd favour more encapsulation, in order to ensure that >multi-thread access can be controlled, e.g. hide the "ClusterSet" >data-structures, and provide access through identifiers that are >atomically created when clusters are added to the instance, and >so on... > >It would be great if you can come up with a concrete (and fully >tested) implementation of "ClusterSet". ;-) I would like to try, but it is a big change to API, we need more time and discussions. > >Best, >Gilles > >> > >> >Of course the devil is in the details (maintain consistency among >> >the fields of "ClusterSet", ensure that "Visitor" implementations >> >cannot modify the internal state, ...). >> > >> >> Unless I missed some point, I'd ask again that the API be reviewed >> >> *before* implementing several features (such as those "extractors") >> >> on top of something that does not look right. >> > >> >One perspective might be to try and come with a design for use in >> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet"). >> > >> >Best regards, >> >Gilles >> > >> >> >> [...] > >--------------------------------------------------------------------- >To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >For additional commands, e-mail: dev-h...@commons.apache.org > >