Hi, >Hello. > >2020-03-06 9:48 UTC+01:00, chentao...@qq.com <chentao...@qq.com>: >> Hi, >> For machine learning centroid cluster algorithm, we often use is >> Calinsk-iHarabasz score to evaluate which algorithm or how many centers is >> best for a dataset. >> The python lib sklearn implements Calinsk-iHarabasz as >> sklearn.metrics.calinski_harabasz_score. > >Could you post a reference (most of our documentation points >to "Wikipedia" or "MathWorld")?
"Calinsk-iHarabasz" is the most popular evaluator for Centriod Clusters as I know. I just read the code of sklearn, and think it easy to implement. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html https://www.tandfonline.com/doi/abs/10.1080/03610927408827101 > >> I think there should be a CalinskiHarabaszClusterEvaluator in commons math: > >At first sight, the approach would be to define a functional >interface (with the "score" method). >Then an "enum" that would be a factory of evaluators, along >the lines of what has been done in "Commons RNG" (see class >"RandomSource"[1]). I just inherit the design of "ClusterEvaluator", and I think change the design of exists API is another question. > >> ```java >> package org.apache.commons.math4.ml.clustering.evaluation; >> >> import org.apache.commons.math4.ml.clustering.Cluster; >> import org.apache.commons.math4.ml.clustering.Clusterable; >> >> import java.util.List; >> >> public class CalinskiHarabaszClusterEvaluator<T extends Clusterable> extends >> ClusterEvaluator<T> { >> @Override >> public double score(List<? extends Cluster<T>> clusters) { >> //TODO: Implement the Calinski-Harabasz Score algorithm >> return 0; >> } >> >> @Override >> public boolean isBetterScore(double score1, double score2) { >> return score1 > score2; >> } > >This method does not seem very useful. > >> } >> ``` >> >> The code can be implemented by read the algorithm documents, >> or translate from python sklearn.metrics.calinski_harabasz_score. > >What's the license of that code? The sklearn is under the BSD license. I think math ml reference the sklearn so much, for example: org.apache.commons.math4.userguide.ClusterAlgorithmComparison > >Regards, >Gilles > >[1] >https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html > >--------------------------------------------------------------------- >To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >For additional commands, e-mail: dev-h...@commons.apache.org > >