Need suggestions on monitor Spark progress

2015-11-29 Thread Yuhao Yang
Hi all, I got a simple processing job for 2 accounts on 8 partitions. It's roughly 2500 accounts on each partition. Each account will take about 1s to complete the computation. That means each partition will take about 2500 seconds to finish the batch. My question is how can I get the detaile

FuzzyCMeans Implementation

2015-11-29 Thread salexln
Hi guys, I'm working on implementation of FuzzyCMeans (https://issues.apache.org/jira/browse/SPARK-2344), and wanted your thought on whether should FuzzyCMeans class inherit from KMeans? On the one hand, they have a lot of in common, but on the other hand, other algorithms based on KMeans (Bisecti