Hi, As a part of the project, we are trying to create parallel implementation of BIRCH clustering algorithm [1]. We are mostly getting idea how to do it from this paper, which used CUDA to make BIRCH parallel [2]. ([2] is short paper, just section 4. is relevant).
We would like to implement BIRCH on Spark. Would this be an interesting contribution for MLlib? Is there anyone already who tried to implement BIRCH on Spark? Any suggestions for implementation itself would be very much appreciated! [1] http://www.cs.sfu.ca/CourseCentral/459/han/papers/zhang96.pdf [2] http://boyuan.global-optimization.com/Mypaper/IDEAL2013-88.pdf Best, Dzeno