Rdd partitioning

anshu shukla Sat, 11 Jul 2015 03:03:50 -0700

Suppose i have RDD with 10 tuples and cluster  with 100 cores (standalone
mode) the by dafault how the partition will be  done.
I did not get how it will divide 20 tuples set (RDD) to 100 cores .(By
default )


Mentioned in documentation -
*spark.default.parallelism*

For distributed shuffle operations likereduceByKey and join, the largest
number of partitions in a parent RDD. For operations likeparallelize with
no parent RDDs, it depends on the cluster manager -

   - *Others: total number of cores on all executor nodes or 2, whichever
   is larger*



-- 
Thanks & Regards,
Anshu Shukla

Rdd partitioning

Reply via email to