On Fri, Jul 11, 2014 at 7:32 PM, durin <m...@simon-schaefer.net> wrote:
> How would you get more partitions?

You can specify this as the second arg to methods that read your data
originally, like:
sc.textFile("...", 20)

> I ran broadcastVector.value.repartition(5), but
> broadcastVector.value.partitions.size is still 1 and no change to the
> behavior is visible.

These are immutable, so to have effect you have to do something like:
val repartitioned = broadcastVector.value.repartition(5)


> First of all, there is a gap of almost two minutes between the third to last
> and second to last line, where no activity is shown in the WebUI. Is that
> the GC at work? If yes, how would I improve this?

You mean there are a few minutes where no job is running? I assume
that's time when the driver is busy doing something. Is it thrashing?


> Also, "Local KMeans++ reached the max number of iterations: 30" surprises
> me. I have ran training using
>
> is it possible that somehow, there are still 30 iterations executed, despite
> of the 3 I set?

Are you sure you set 3 iterations?

Reply via email to