Hi, This is on a 4 nodes cluster each with 32 cores/256GB Ram.
(0.9.0) is deployed in a stand alone mode. Each worker is configured with 192GB. Spark executor memory is also 192GB. This is on the first iteration. K=500000. Here’s the code I use: http://pastebin.com/2yXL3y8i , which is a copy-and-paste of the example. Thanks! On 24 Mar, 2014, at 2:46 pm, Xiangrui Meng <men...@gmail.com> wrote: > Hi Tsai, > > Could you share more information about the machine you used and the > training parameters (runs, k, and iterations)? It can help solve your > issues. Thanks! > > Best, > Xiangrui > > On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming <mailingl...@ltsai.com> wrote: >> Hi, >> >> At the reduceBuyKey stage, it takes a few minutes before the tasks start >> working. >> >> I have -Dspark.default.parallelism=127 cores (n-1). >> >> CPU/Network/IO is idling across all nodes when this is happening. >> >> And there is nothing particular on the master log file. From the spark-shell: >> >> 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:124 as TID 538 on >> executor 2: XXX (PROCESS_LOCAL) >> 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:124 as 38765155 >> bytes in 193 ms >> 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:125 as TID 539 on >> executor 1: XXX (PROCESS_LOCAL) >> 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:125 as 38765155 >> bytes in 96 ms >> 14/03/23 18:13:50 INFO TaskSetManager: Starting task 3.0:126 as TID 540 on >> executor 0: XXX (PROCESS_LOCAL) >> 14/03/23 18:13:50 INFO TaskSetManager: Serialized task 3.0:126 as 38765155 >> bytes in 100 ms >> >> But it stops there for some significant time before any movement. >> >> In the stage detail of the UI, I can see that there are 127 tasks running >> but the duration each is at least a few minutes. >> >> I'm working off local storage (not hdfs) and the kmeans data is about 6.5GB >> (50M rows). >> >> Is this a normal behaviour? >> >> Thanks!