Hi,

>Le ven. 20 mars 2020 à 04:47, chentao...@qq.com <chentao...@qq.com> a écrit :
>>
>> Hi,
>>
>> >Hello.
>> >
>> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <gillese...@gmail.com> a 
>> >écrit :
>> >>
>> >> Hi.
>> >>
>> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com <chentao...@qq.com>:
>> >> > Hi,
>> >> >     I have created a PR to show my aim:
>> >> > https://github.com/apache/commons-math/pull/126/files
>> >>
>> >> Am I correct that the implementations of "ClustersPointExtractor"
>> >> modify the argument of the "extract" method?
>> >> If so, that seems quite unsafe.  I would not expect this behaviour
>> >> in a public API.
>> >
>> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
>> >code):
>> >
>> >public class Extractor {
>> >    T extract(List<List<T>> list);
>> >}
>>
>> I have read the exists code again, and I found "EmptyClusterStrategy"
>>  and related logic only used in K-Means.
>> And the "Extractor" remove a point from exists Clusters is indeed unsuitable
>> to be a public API(as you mentioned before.)
>> I think another choice is simple make "EmptyClusterStrategy" and related 
>> logic protected,
>> then it can be resuse by MiniBatchKMeans.
>>
>> Also after the "CentroidInitializer" has been move out, the 
>> "KMeansPlusPlusClusterer"
>>  should rename to "KMeansClusterer" with a construct parameter 
>>"CentroidInitializer"
>
>If you think that it makes sense to define a class hierarchy for
>"KMeans"-based algorithms, please do so.  But note that "protected"
>is only slightly better than "public" (from a maintenance POV, it isn't
>better, since we'll be required to maintain compatibility all the same).
>Perhaps the base classe(s) can be package-private... 

OK, package-private on "EmptyClusterStrategy" and related logic is fine.

>
>> >
>> >with something along those lines:
>> >
>> >public class ClusterSet<T> {
>> >    private Set<T> data;
>> >    private List<List<T>> clusters;
>> >
>> >    void remove(T e) {
>> >        return data.remove(e);
>> >    }
>> >
>> >    public interface Visitor {
>> >        T visit(List<List<T>> list);
>> >    }
>> >}
>> >
>> >Key point is separation of concern (selection of element vs
>> >removal of element).
>>
>> I propose the ClusterSet should has more member like "distanceMeasure"
>>  to store origin clustering parameters,
>> or  to accelerate or support cluster finding or point finding.
>> ```java
>> public class ClusterSet<T> {
>>     private List<Cluster> clusters;
>>     private DistanceMeasure measure;
>>     ...
>>     public List<Cluster> closestTopN(Clusterable point);
>
>How about providing a comparator factory?
>
>public <T extends Clusterable> static
> Comparator<Cluster<T>> getComparator(final Clusterable p, final
>DistanceMeasure m) {
>    return new Comparator<>() {
>        public int compare(Cluster<T> c1, Cluster<T> c2) {
>            final double d1 = c1.distance(p, m); // "distance" method
>to be added to the interface (?).
>            final double d2 = c2.distance(p, m);
>            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
>        }
>    }
>} 

How do you think about the "distanceMeasure" to be a memeber of "ClusterSet",
In my opinion "ClusterSet" should contains all the parameters and result for 
clustering and clusters.

>
>Then, if also providing an accessor to the clusters:
>
>public List<Cluster<T>> getClusters() {
>    return Collections.umodifiableList(clusters);
>}
>
>users can do whatever they want, a.o. easily implement the
>ranking method which you suggest:
>
>/** @return clusters in descending order of proximity. */
>public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
>DistanceMeasure m) {
>    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
>    Collections.sort(cList, set.getComparator(p, m));
>    return cList;
>}
>
>>
>>     public int pointSize();
>> }
>> ```
>
>A priori, I'd favour more encapsulation, in order to ensure that
>multi-thread access can be controlled, e.g. hide the "ClusterSet"
>data-structures, and provide access through identifiers that are
>atomically created when clusters are added to the instance, and
>so on...
>
>It would be great if you can come up with a concrete (and fully
>tested) implementation of "ClusterSet". ;-) 

I would like to try, but it is a big change to API, we need more time and 
discussions.

>
>Best,
>Gilles
>
>> >
>> >Of course the devil is in the details (maintain consistency among
>> >the fields of "ClusterSet", ensure that "Visitor" implementations
>> >cannot modify the internal state, ...).
>> >
>> >> Unless I missed some point, I'd ask again that the API be reviewed
>> >> *before* implementing several features (such as those "extractors")
>> >> on top of something that does not look right.
>> >
>> >One perspective might be to try and come with a design for use in
>> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
>> >
>> >Best regards,
>> >Gilles
>> >
>> >> >> [...]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to