[Math] kmeans++: decouple EM LLoyd's iterations and initial seeding of clustering centers.

Artem Barger Tue, 31 May 2016 06:15:02 -0700

Hi,

Current implementation of kmeans within CM framework, inherently uses
algorithm published by  Arthur, David, and Sergei Vassilvitskii.
"k-means++: The advantages of careful seeding." *Proceedings of the
eighteenth annual ACM-SIAM symposium on Discrete algorithms*. Society for
Industrial and Applied Mathematics, 2007. While there other alternative
algorithms for initial seeding is available, for instance:


1. Random initialization (each center picked uniformly at random).
2. Canopy https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
3. Bicriteria  Feldman, Dan, et al. "Bi-criteria linear-time approximations
for generalized k-mean/median/center." *Proceedings of the twenty-third
annual symposium on Computational geometry*. ACM, 2007.

While I understand that kmeans++ is preferable option, others could be also
used for testing, trials and evaluations as well.

I'd like to propose to separate logic of seeding and clustering to increase
flexibility for kmeans clustering. Would be glad to hear your comments,
pros/cons or rejections...

Thanks,
                      Artem Barger.

[Math] kmeans++: decouple EM LLoyd's iterations and initial seeding of clustering centers.

Reply via email to