yunfengzhou-hub commented on a change in pull request #24: URL: https://github.com/apache/flink-ml/pull/24#discussion_r753905182
########## File path: flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/EuclideanDistance.java ########## @@ -0,0 +1,272 @@ +package org.apache.flink.ml.classification.knn; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.ml.linalg.DenseVector; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; + +import org.apache.flink.shaded.curator4.com.google.common.collect.Iterables; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; + +import static org.apache.flink.ml.classification.knn.KnnUtil.appendVectorToMatrix; + +/** + * Euclidean distance is the "ordinary" straight-line distance between two points in Euclidean + * space. + * + * <p>https://en.wikipedia.org/wiki/Euclidean_distance + * + * <p>Given two vectors a and b, Euclidean Distance = ||a - b||, where ||*|| means the L2 norm of + * the vector. + */ +public class EuclideanDistance implements Serializable { Review comment: If it's just about adding some optimizations, it might also be suitable for kmeans. I am still not sure that we should have separate euclidean distance for knn and kmeans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org