Re: Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

Gilles Sadowski Fri, 06 Mar 2020 18:47:08 -0800

Le ven. 6 mars 2020 à 14:35, [email protected] <[email protected]> a écrit :
>
> Hi,
>
> >Hello.
> >
> >2020-03-06 9:48 UTC+01:00, [email protected] <[email protected]>:
> >> Hi,
> >>     For machine learning centroid cluster algorithm, we often use is
> >> Calinsk-iHarabasz score to evaluate which algorithm or how many centers is
> >> best for a dataset.
> >>     The python lib sklearn implements Calinsk-iHarabasz as
> >> sklearn.metrics.calinski_harabasz_score.
> >
> >Could you post a reference (most of our documentation points
> >to "Wikipedia" or "MathWorld")?
>
> "Calinsk-iHarabasz" is the most popular evaluator for Centriod Clusters as I 
> know.
> I just read the code of sklearn, and think it easy to implement.
> https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html
> https://www.tandfonline.com/doi/abs/10.1080/03610927408827101


Thanks; the original reference is quite fine too.

> >
> >> I think there should be a CalinskiHarabaszClusterEvaluator in commons math:
> >
> >At first sight, the approach would be to define a functional
> >interface (with the "score" method).
> >Then an "enum" that would be a factory of evaluators, along
> >the lines of what has been done in "Commons RNG" (see class
> >"RandomSource"[1]).
>
> I just inherit the design of "ClusterEvaluator",
> and I think change the design of exists API is another question.

Not really: IMHO we should not pile feature on top of an
API that might have shortcomings.  In particular, the fact
that the new calls' constructor calls the parent's constructor
with "null" looks problematic to me.

> >
> >> ```java
> >> package org.apache.commons.math4.ml.clustering.evaluation;
> >>
> >> import org.apache.commons.math4.ml.clustering.Cluster;
> >> import org.apache.commons.math4.ml.clustering.Clusterable;
> >>
> >> import java.util.List;
> >>
> >> public class CalinskiHarabaszClusterEvaluator<T extends Clusterable> 
> >> extends
> >> ClusterEvaluator<T> {
> >>     @Override
> >>     public double score(List<? extends Cluster<T>> clusters) {
> >>         //TODO: Implement the Calinski-Harabasz Score algorithm
> >>         return 0;
> >>     }
> >>
> >>     @Override
> >>     public boolean isBetterScore(double score1, double score2) {
> >>         return score1 > score2;
> >>     }
> >
> >This method does not seem very useful.

I've now seen how this used by "MultiKMeansPlusPlusClusterer".
However, I wonder why the "Multi" feature is only available for that
implementation...

> >> }
> >> ```
> >>
> >> The code can be implemented by read the algorithm documents,
> >> or translate from python sklearn.metrics.calinski_harabasz_score.
> >
> >What's the license of that code?
>
> The sklearn is under the BSD license.

OK; no problem[1] to have claimed inspiration then. ;-)

Please note that, for tracking purpose, your PR should be tied
to a JIRA report, and the issue's identifier should prefix the
commit message.
The PR is also not in sync with current "master" branch.

Regards,
Gilles

[1] http://www.apache.org/legal/resolved.html#category-a

> I think math ml reference the sklearn so much,
> for example: org.apache.commons.math4.userguide.ClusterAlgorithmComparison
>
> >
> >Regards,
> >Gilles
> >
> >[1] 
> >https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Re: [math]Discuss: There should be a CalinskiHarabaszClusterEvaluator in ml package

Reply via email to