2020-03-25 4:54 UTC+01:00, chentao...@qq.com <chentao...@qq.com>:
> Hi,
>
>>Hello.
>>
>>Le mar. 24 mars 2020 à 06:39, chentao...@qq.com <chentao...@qq.com> a écrit
>> :
>>>
>>> Hi,
>>>
>>>     I have started 2 PRs to solve the problem you metioned.
>>>
>>>     About the "CentroidInitializer" I have a new idea:
>>> Move CentroidInitializers as inner classes of "KMeansPlusPlusCluster",
>>> and add a construct parameter and a property "useKMeansPlusPlus" to
>>> "KMeansPlusPlusCluster":
>>> ```java
>>> // Add "useKMeansPlusPlus" to "KMeansPlusPlusClusterer"
>>> public class KMeansPlusPlusClusterer<T extends Clusterable> extends
>>> Clusterer<T> {
>>>     public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>>>                                final DistanceMeasure measure,
>>>                                final UniformRandomProvider random,
>>>                                final EmptyClusterStrategy emptyStrategy,
>>> +                             final useKMeansPlusPlus) {
>>>     // ...
>>> -  // Use K-means++ to choose the initial centers.
>>> -  this.centroidInitializer = new
>>> KMeansPlusPlusCentroidInitializer(measure, random);
>>> +  this.useKMeansPlusPlus = useKMeansPlusPlus;
>>> }
>>
>>What if one comes up with a third way to initialize the centroids?
>>If you can ensure that there is no other initialization procedure,
>>a boolean is fine, if not, we could still make the existing procedures
>>package-private (e.g. moving them in as classes defined within
>>"KMeansPlusPlusClusterer".
>
> As I know the k-means has two center initialize methods, random and
> k-means++ so far,
> use a boolean to choose which method to use is good enough for current use,

Indeed but the question is: Will it be future-proof?
[We don't want to break compatibility (and have to release a
new major version of the library) for having overlooked the
question which I'm asking.]

> but there are two situations use need to implement the center initialize
> method themselves:
> 1. The Commoans Maths's implements is not good enough;

As this is FLOSS, the understanding (IMO) is in that case users
would contribute back to fix what needs be.

> 2. There are new center initialize methods.

So, that would be arguing against using a boolean (?).
Cf. above (about breaking compatibility).

>>
>>Also, in the current implementation of "KMeansPlusPlusClusterer", the
>>initialization is not configurable ("KMeansPlusPlusCentroidInitializer").
>>Perhaps we don't want to depart from the original (?) algorithm; if so,
>>the new constructor could be made protected (thus simplifying the API).
>
> k-means++ is the recommend center initialize method for now days,
> show we let user to fall back to random choose centers, that is a question
> need to tradeoff.

Is there any advantage to "random" init vs "kmeans" init?
E.g. is "random" faster, yet would give similar clustering results?

> Show we make the API simple or rich?

I'd keep it simple until we fix the (IMHO) more important
issues of thread-safety and sparse data.

>>
>>> public boolean isUseKMeansPlusPlus() {return this.useKMeansPlusPlus;}
>>
>>Why should this method be defined?
>
> To let user get their cluster parameters, same as "getEmptyStrategy()"
>

Well, obviously.  But then, perhaps obviously too, the "user" is the
one who called the constructor in the first place, and knowns the
value of all the arguments.
And if we consider the general use-case, when client code is passed
an instance as type "Clusterer", then the specific accessor methods
are not available anymore (short of using "instanceof" and a cast).


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to