Hi,

This is on a 4 nodes cluster each with 32 cores/256GB Ram. 

(0.9.0) is deployed in a stand alone mode.

Each worker is configured with 192GB. Spark executor memory is also 192GB. 

This is on the first iteration. K=500000. Here’s the code I use:
http://pastebin.com/2yXL3y8i , which is a copy-and-paste of the example.

Thanks!



On 24 Mar, 2014, at 2:46 pm, Xiangrui Meng <men...@gmail.com> wrote:

> Hi Tsai,
> 
> Could you share more information about the machine you used and the
> training parameters (runs, k, and iterations)? It can help solve your
> issues. Thanks!
> 
> Best,
> Xiangrui
> 
> On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming <mailingl...@ltsai.com> wrote:
>> Hi,
>> 
>> At the reduceBuyKey stage, it takes a few minutes before the tasks start 
>> working.
>> 
>> I have -Dspark.default.parallelism=127 cores (n-1).
>> 
>> CPU/Network/IO is idling across all nodes when this is happening.
>> 
>> And there is nothing particular on the master log file. From the spark-shell:
>> 
>> 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:124 as TID 538 on 
>> executor 2: XXX (PROCESS_LOCAL)
>> 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:124 as 38765155 
>> bytes in 193 ms
>> 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:125 as TID 539 on 
>> executor 1: XXX (PROCESS_LOCAL)
>> 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:125 as 38765155 
>> bytes in 96 ms
>> 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:126 as TID 540 on 
>> executor 0: XXX (PROCESS_LOCAL)
>> 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:126 as 38765155 
>> bytes in 100 ms
>> 
>> But it stops there for some significant time before any movement.
>> 
>> In the stage detail of the UI, I can see that there are 127 tasks running 
>> but the duration each is at least a few minutes.
>> 
>> I'm working off local storage (not hdfs) and the kmeans data is about 6.5GB 
>> (50M rows).
>> 
>> Is this a normal behaviour?
>> 
>> Thanks!

Reply via email to