Hi Sean, thanks for you reply.

How would you get more partitions?
I ran broadcastVector.value.repartition(5), but
broadcastVector.value.partitions.size is still 1 and no change to the
behavior is visible.

Also, I noticed this:


First of all, there is a gap of almost two minutes between the third to last
and second to last line, where no activity is shown in the WebUI. Is that
the GC at work? If yes, how would I improve this?

Also, "Local KMeans++ reached the max number of iterations: 30" surprises
me. I have ran training using 

is it possible that somehow, there are still 30 iterations executed, despite
of the 3 I set?


Best regards,
Simon



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407p9431.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to