On Fri, Jul 11, 2014 at 7:32 PM, durin <m...@simon-schaefer.net> wrote: > How would you get more partitions?
You can specify this as the second arg to methods that read your data originally, like: sc.textFile("...", 20) > I ran broadcastVector.value.repartition(5), but > broadcastVector.value.partitions.size is still 1 and no change to the > behavior is visible. These are immutable, so to have effect you have to do something like: val repartitioned = broadcastVector.value.repartition(5) > First of all, there is a gap of almost two minutes between the third to last > and second to last line, where no activity is shown in the WebUI. Is that > the GC at work? If yes, how would I improve this? You mean there are a few minutes where no job is running? I assume that's time when the driver is busy doing something. Is it thrashing? > Also, "Local KMeans++ reached the max number of iterations: 30" surprises > me. I have ran training using > > is it possible that somehow, there are still 30 iterations executed, despite > of the 3 I set? Are you sure you set 3 iterations?