The "netlib.BLAS: Failed to load implementation" warning only means that
the BLAS implementation may be slower than using a native one. The reason
why it only shows up at the end is that the library is only used for the
finalization step of the KMeans algorithm, so your job should've been
wrapping
Your latest response doesn't show up here yet, I only got the mail. I'll
still answer here in the hope that it appears later:
Which memory setting do you mean? I can go up with spark.executor.memory a
bit, it's currently set to 12G. But thats already way more than the whole
SchemaRDD of Vectors th
Thanks, setting the number of partitions to the number of executors helped a
lot and training with 20k entries got a lot faster.
However, when I tried training with 1M entries, after about 45 minutes of
calculations, I get this:
It's stuck at this point. The CPU load for the master is at 100% (
On Fri, Jul 11, 2014 at 7:32 PM, durin wrote:
> How would you get more partitions?
You can specify this as the second arg to methods that read your data
originally, like:
sc.textFile("...", 20)
> I ran broadcastVector.value.repartition(5), but
> broadcastVector.value.partitions.size is still 1 a
Hi Sean, thanks for you reply.
How would you get more partitions?
I ran broadcastVector.value.repartition(5), but
broadcastVector.value.partitions.size is still 1 and no change to the
behavior is visible.
Also, I noticed this:
First of all, there is a gap of almost two minutes between the third
How many partitions do you use for your data? if the default is 1, you
probably need to manually ask for more partitions.
Also, I'd check that your executors aren't thrashing close to the GC
limit. This can make things start to get very slow.
On Fri, Jul 11, 2014 at 9:53 AM, durin wrote:
> Hi,
>