Hi all, MLlib currently has one clustering algorithm implementation, KMeans. It would benefit from having implementations of other clustering algorithms such as MiniBatch KMeans, Fuzzy C-Means, Hierarchical Clustering, and Affinity Propagation.
I recently submitted a PR [1] for a MiniBatch KMeans implementation, and I saw an email on this list about interest in implementing Fuzzy C-Means. Based on Sean Owen's review of my MiniBatch KMeans code, it became apparent that before I implement more clustering algorithms, it would be useful to hammer out a framework to reduce code duplication and implement a consistent API. I'd like to gauge the interest and goals of the MLlib community: 1. Are you interested in having more clustering algorithms available? 2. Is the community interested in specifying a common framework? Thanks! RJ [1] - https://github.com/apache/spark/pull/1248 -- em rnowl...@gmail.com c 954.496.2314