Hi, > [...] >> >> >> >> Do you mean I should fire a JIRA issue about reuse "centroidOf" and >> >> "chooseInitialCenters", >> >> then start a PR and a disscuss about "ClusterUtils"? >> >> And then start the PR of "MiniBatchKMeansClusterer" after all done? >> > >> >I cannot guarantee that the whole process will be streamlined. >> >In effect, you can work on multiple branches (one for each >> >prospective PR). >> >I'd say that you should start by describing (here on the ML) the >> >rationale for "ClusterUtils" (and contrast it with say, a common >> >base class). >> >[Only when the design has been agreed on, a JIRA issue to >> >implement it should be created in order to track the actual >> >coding work).] >> >> OK, I think we should start from here: >> >> The method "centroidOf" and "chooseInitialCenters" in >> KMeansPlusPlusClusterer >> could be reused by other KMeans Clusterer like MiniBatchKMeansClusterer >>which I want to implement. >> >> There are two solution for reuse "centroidOf" and "chooseInitialCenters": >> 1. Extract a abstract class for KMeans Clusterer named >> "AbstractKMeansClusterer", >> and move "centroidOf" and "chooseInitialCenters" as protected methods in >>it; >> the EmptyClusterStrategy and related logic can also move to the >>"AbstractKMeansClusterer". >> 2. Create a static utility class, and move "centroidOf" and >> "chooseInitialCenters" in it, >> and some useful clustering method like predict(Predict which cluster is >>best for a specified point) can put in it. >> > >At first sight, I prefer option 1. >Indeed, o.a things "chooseInitialCenters" is a method that is of no interest to >users of the functionality (and so should not be part of the "public" API).
Persuasive explain, and I agree with you, that extract a abstract class for KMeans is better. And how can we make a conclusion? --------------------------------------------- Mention the "public API", I suppose there should be a series of "CentroidInitializer", that "chooseInitialCenters" with various of algorithms. The k-means++ cluster algorithm is a special implementation of k-means which initialize cluster centers with k-means++ algorithm. So if there is a "CentroidInitializer", "KMeansPlusPlusClusterer" can be "KMeansClusterer" with a "KMeansPlusPlusCentroidInitializer" strategy. When "KMeansClusterer" initialize with a "RandomCentroidInitializer", it is a common k-means. ---------------------------------------------------------- >Method "centroidOf" looks generally useful. Shouldn't it be part of >the "Cluster" >interface? What is the difference with method "getCenter" (define by class >"CentroidCluster")? My understanding is,: * "Cluster" is a data class that carry the result of a clustering, "getCenter" is just a get method of CentroidCluster for get the value of a center point. * "Cluster[er]" is a (Interface of )algorithm that classify points to sets of Cluster. * "CentroidCluster" is the result of a group of special Clusterer algorithm like k-means, "centroidOf" is a specific logic to calculate the center point for a collection of points. [Instead the DBScan cluster algorithm dose not care about the "Centroid"] So, "centroidOf" may be a method of "CentroidCluster[er]"(not exists yet), but different with "CentroidCluster.getCenter". > >Regards, >Gilles > >--------------------------------------------------------------------- >To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >For additional commands, e-mail: dev-h...@commons.apache.org > >