Hello. 2020-03-06 9:48 UTC+01:00, chentao...@qq.com <chentao...@qq.com>: > Hi, > For machine learning centroid cluster algorithm, we often use is > Calinsk-iHarabasz score to evaluate which algorithm or how many centers is > best for a dataset. > The python lib sklearn implements Calinsk-iHarabasz as > sklearn.metrics.calinski_harabasz_score.
Could you post a reference (most of our documentation points to "Wikipedia" or "MathWorld")? > I think there should be a CalinskiHarabaszClusterEvaluator in commons math: At first sight, the approach would be to define a functional interface (with the "score" method). Then an "enum" that would be a factory of evaluators, along the lines of what has been done in "Commons RNG" (see class "RandomSource"[1]). > ```java > package org.apache.commons.math4.ml.clustering.evaluation; > > import org.apache.commons.math4.ml.clustering.Cluster; > import org.apache.commons.math4.ml.clustering.Clusterable; > > import java.util.List; > > public class CalinskiHarabaszClusterEvaluator<T extends Clusterable> extends > ClusterEvaluator<T> { > @Override > public double score(List<? extends Cluster<T>> clusters) { > //TODO: Implement the Calinski-Harabasz Score algorithm > return 0; > } > > @Override > public boolean isBetterScore(double score1, double score2) { > return score1 > score2; > } This method does not seem very useful. > } > ``` > > The code can be implemented by read the algorithm documents, > or translate from python sklearn.metrics.calinski_harabasz_score. What's the license of that code? Regards, Gilles [1] https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org