Having a common framework for clustering makes sense to me. While we should be careful about what algorithms we include, having solid implementations of minibatch clustering and hierarchical clustering seems like a worthwhile goal, and we should reuse as much code and APIs as reasonable.
On Tue, Jul 8, 2014 at 1:19 PM, RJ Nowling <rnowl...@gmail.com> wrote: > Thanks, Hector! Your feedback is useful. > > On Tuesday, July 8, 2014, Hector Yee <hector....@gmail.com> wrote: > > > I would say for bigdata applications the most useful would be > hierarchical > > k-means with back tracking and the ability to support k nearest > centroids. > > > > > > On Tue, Jul 8, 2014 at 10:54 AM, RJ Nowling <rnowl...@gmail.com > > <javascript:;>> wrote: > > > > > Hi all, > > > > > > MLlib currently has one clustering algorithm implementation, KMeans. > > > It would benefit from having implementations of other clustering > > > algorithms such as MiniBatch KMeans, Fuzzy C-Means, Hierarchical > > > Clustering, and Affinity Propagation. > > > > > > I recently submitted a PR [1] for a MiniBatch KMeans implementation, > > > and I saw an email on this list about interest in implementing Fuzzy > > > C-Means. > > > > > > Based on Sean Owen's review of my MiniBatch KMeans code, it became > > > apparent that before I implement more clustering algorithms, it would > > > be useful to hammer out a framework to reduce code duplication and > > > implement a consistent API. > > > > > > I'd like to gauge the interest and goals of the MLlib community: > > > > > > 1. Are you interested in having more clustering algorithms available? > > > > > > 2. Is the community interested in specifying a common framework? > > > > > > Thanks! > > > RJ > > > > > > [1] - https://github.com/apache/spark/pull/1248 > > > > > > > > > -- > > > em rnowl...@gmail.com <javascript:;> > > > c 954.496.2314 > > > > > > > > > > > -- > > Yee Yang Li Hector <http://google.com/+HectorYee> > > *google.com/+HectorYee <http://google.com/+HectorYee>* > > > > > -- > em rnowl...@gmail.com > c 954.496.2314 >