Le ven. 20 mars 2020 à 04:47, chentao...@qq.com <chentao...@qq.com> a écrit :
>
> Hi,
>
> >Hello.
> >
> >Le mer. 18 mars 2020 à 17:57, Gilles Sadowski <gillese...@gmail.com> a écrit 
> >:
> >>
> >> Hi.
> >>
> >> 2020-03-18 15:10 UTC+01:00, chentao...@qq.com <chentao...@qq.com>:
> >> > Hi,
> >> >     I have created a PR to show my aim:
> >> > https://github.com/apache/commons-math/pull/126/files
> >>
> >> Am I correct that the implementations of "ClustersPointExtractor"
> >> modify the argument of the "extract" method?
> >> If so, that seems quite unsafe.  I would not expect this behaviour
> >> in a public API.
> >
> >To be clearer (hopefully): I'd contrast this (incomplete/non-compilable
> >code):
> >
> >public class Extractor {
> >    T extract(List<List<T>> list);
> >}
>
> I have read the exists code again, and I found "EmptyClusterStrategy"
>  and related logic only used in K-Means.
> And the "Extractor" remove a point from exists Clusters is indeed unsuitable
> to be a public API(as you mentioned before.)
> I think another choice is simple make "EmptyClusterStrategy" and related 
> logic protected,
> then it can be resuse by MiniBatchKMeans.
>
> Also after the "CentroidInitializer" has been move out, the 
> "KMeansPlusPlusClusterer"
>  should rename to "KMeansClusterer" with a construct parameter 
> "CentroidInitializer"

If you think that it makes sense to define a class hierarchy for
"KMeans"-based algorithms, please do so.  But note that "protected"
is only slightly better than "public" (from a maintenance POV, it isn't
better, since we'll be required to maintain compatibility all the same).
Perhaps the base classe(s) can be package-private...

> >
> >with something along those lines:
> >
> >public class ClusterSet<T> {
> >    private Set<T> data;
> >    private List<List<T>> clusters;
> >
> >    void remove(T e) {
> >        return data.remove(e);
> >    }
> >
> >    public interface Visitor {
> >        T visit(List<List<T>> list);
> >    }
> >}
> >
> >Key point is separation of concern (selection of element vs
> >removal of element).
>
> I propose the ClusterSet should has more member like "distanceMeasure"
>  to store origin clustering parameters,
> or  to accelerate or support cluster finding or point finding.
> ```java
> public class ClusterSet<T> {
>     private List<Cluster> clusters;
>     private DistanceMeasure measure;
>     ...
>     public List<Cluster> closestTopN(Clusterable point);

How about providing a comparator factory?

public <T extends Clusterable> static
 Comparator<Cluster<T>> getComparator(final Clusterable p, final
DistanceMeasure m) {
    return new Comparator<>() {
        public int compare(Cluster<T> c1, Cluster<T> c2) {
            final double d1 = c1.distance(p, m); // "distance" method
to be added to the interface (?).
            final double d2 = c2.distance(p, m);
            return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
        }
    }
}

Then, if also providing an accessor to the clusters:

public List<Cluster<T>> getClusters() {
    return Collections.umodifiableList(clusters);
}

users can do whatever they want, a.o. easily implement the
ranking method which you suggest:

/** @return clusters in descending order of proximity. */
public List<Cluster> closest(Clusterable p, ClusterSet<T> s,
DistanceMeasure m) {
    final List<Cluster<T>> cList = new ArrayList<>(s.getClusters());
    Collections.sort(cList, set.getComparator(p, m));
    return cList;
}

>
>     public int pointSize();
> }
> ```

A priori, I'd favour more encapsulation, in order to ensure that
multi-thread access can be controlled, e.g. hide the "ClusterSet"
data-structures, and provide access through identifiers that are
atomically created when clusters are added to the instance, and
so on...

It would be great if you can come up with a concrete (and fully
tested) implementation of "ClusterSet". ;-)

Best,
Gilles

> >
> >Of course the devil is in the details (maintain consistency among
> >the fields of "ClusterSet", ensure that "Visitor" implementations
> >cannot modify the internal state, ...).
> >
> >> Unless I missed some point, I'd ask again that the API be reviewed
> >> *before* implementing several features (such as those "extractors")
> >> on top of something that does not look right.
> >
> >One perspective might be to try and come with a design for use in
> >multi-threaded applications (see e.g. package "o.a.c.m.ml.neuralnet").
> >
> >Best regards,
> >Gilles
> >
> >> >> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to