Hi Gilles, ------------------ Original ------------------ From: "GillesSadowski"<gillese...@gmail.com>; Date: Wed, Feb 26, 2020 05:41 PM To: "Commons Developers List"<dev@commons.apache.org>;
Subject: Re: [math] Discuss: New feature MiniBatchKMeansClusterer >[...] >> Do you mean this: >> &nbsp;* For JIRA issue #MATH-1509 Start a PR with "MiniBatchKMeansClusterer", but without the "ClusterUtils", >> despite the duplicate code between "MiniBatchKMeansClusterer" and "KMeansPlusPlusClusterer", >> also with "CentroidInitializer" and test code with in a single commit. >> &nbsp;* Suggestions like "remove the constructors with default parameters" should apply as a new commit of the PR above, >> and tracking by a subtask of JIRA issue #MATH-1509. >> &nbsp;* Fire a new JIRA issue for the duplicate code, and start another PR with "ClusterUtils" in, >> and extract duplicate code into "ClusterUtils". > >No, you should start with the smallest possible self-contained PR. >For example, why should we commit a code that defines several >constructors, while we already know that a second commit should >remove them? > >As you've noticed that some functionality must be factored out of >"KMeansPlusPlusClusterer", this should be done first as a separate >JIRA issue. IIUC, you propose "ClusterUtils". By reviewing a >minimal PR, we should be able to examine whether another >approach might be better (than a "utility" class) in order to expose >functionality common to all clusterer algorithms. >For example, could all "Kmeans" implementations inherit from >a common base class? Do you mean I should fire a JIRA issue about reuse "centroidOf" and "chooseInitialCenters", then start a PR and a disscuss about "ClusterUtils"? And then start the PR of "MiniBatchKMeansClusterer" after all done? >[...]