Hi Sean, thanks for you reply. How would you get more partitions? I ran broadcastVector.value.repartition(5), but broadcastVector.value.partitions.size is still 1 and no change to the behavior is visible.
Also, I noticed this: First of all, there is a gap of almost two minutes between the third to last and second to last line, where no activity is shown in the WebUI. Is that the GC at work? If yes, how would I improve this? Also, "Local KMeans++ reached the max number of iterations: 30" surprises me. I have ran training using is it possible that somehow, there are still 30 iterations executed, despite of the 3 I set? Best regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407p9431.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
