On Wed, May 7, 2014 at 4:44 PM, Aaron Davidson <ilike...@gmail.com> wrote:

Spark can only run as many tasks as there are partitions, so if you don't
> have enough partitions, your cluster will be underutilized.

 This is a very important point.

kamatsuoka, how many partitions does your RDD have when you try to save it?
You can check this with myrdd._jrdd.splits().size() in PySpark. If it’s
less than the number of cores in your cluster, try repartition()-ing the
RDD as Aaron suggested.

Nick

Reply via email to