Re: Dividing tasks among Spark workers

2014-07-18 Thread Yanbo Liang
Clusters will not be fully utilized unless you set the level of parallelism for each operation high enough. Spark automatically sets the number of “map” tasks to run on each file according to its size. You can pass the level of parallelism as a second argument or set the config property *spark.defa

Re: Dividing tasks among Spark workers

2014-07-18 Thread Shannon Quinn
The default # of partitions is the # of cores, correct? On 7/18/14, 10:53 AM, Yanbo Liang wrote: check how many partitions in your program. If only one, change it to more partitions will make the execution parallel. 2014-07-18 20:57 GMT+08:00 Madhura >:

Re: Dividing tasks among Spark workers

2014-07-18 Thread Yanbo Liang
check how many partitions in your program. If only one, change it to more partitions will make the execution parallel. 2014-07-18 20:57 GMT+08:00 Madhura : > I am running my program on a spark cluster but when I look into my UI while > the job is running I see that only one worker does most of t