Hi, Current implementation of kmeans within CM framework, inherently uses algorithm published by Arthur, David, and Sergei Vassilvitskii. "k-means++: The advantages of careful seeding." *Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms*. Society for Industrial and Applied Mathematics, 2007. While there other alternative algorithms for initial seeding is available, for instance:
1. Random initialization (each center picked uniformly at random). 2. Canopy https://en.wikipedia.org/wiki/Canopy_clustering_algorithm 3. Bicriteria Feldman, Dan, et al. "Bi-criteria linear-time approximations for generalized k-mean/median/center." *Proceedings of the twenty-third annual symposium on Computational geometry*. ACM, 2007. While I understand that kmeans++ is preferable option, others could be also used for testing, trials and evaluations as well. I'd like to propose to separate logic of seeding and clustering to increase flexibility for kmeans clustering. Would be glad to hear your comments, pros/cons or rejections... ​Thanks, Artem Barger.