[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

Sachin Goel (JIRA) Mon, 01 Jun 2015 19:12:47 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568392#comment-14568392
 ]


Sachin Goel commented on FLINK-1731:
------------------------------------

I'm creating a separate issue for Initialization schemes. This would address 
the Random, kmeans++ and kmeans|| initialization methods. Since any 
initialization itself is a solution to the kmeans problem, they would all be 
instances of Predictor also. User can access the centroids learned via 
instance.centroids and pass them to the KMeans algorithm which has been 
implemented. 
These is another way possible which takes the burden off the user to figure out 
how to pass the initial centroids to KMeans. We can have a parameter which 
signifies which initialization scheme to use. The KMeans algorithm would then 
need to call the appropriate initialization scheme in its fit function and work 
with the centroids found by the initialization scheme as its initial centroids.

> Add kMeans clustering algorithm to machine learning library
> -----------------------------------------------------------
>
>                 Key: FLINK-1731
>                 URL: https://issues.apache.org/jira/browse/FLINK-1731
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Peter Schrott
>              Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not 
> yet ported to the machine learning library. I assume that only the used data 
> types have to be adapted and then it can be more or less directly moved to 
> flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
> implementation because the improve the initial seeding phase to achieve near 
> optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

Reply via email to