At 2014-08-01 02:12:08 -0700, shijiaxin <shijiaxin...@gmail.com> wrote: > When I use fewer partitions, (like 6) > It seems that all the task will be assigned to the same machine, because the > machine has more than 6 cores.But this will run out of memory. > How to set fewer partitions number and use all the machine at the same time?
Yes, I've encountered this problem myself. I haven't tried this, but one idea is to reduce the number of cores that Spark is allowed to use on each worker by passing --cores to spark-submit or setting SPARK_WORKER_CORES. Ankur