Suppose i have RDD with 10 tuples and cluster with 100 cores (standalone mode) the by dafault how the partition will be done. I did not get how it will divide 20 tuples set (RDD) to 100 cores .(By default )
Mentioned in documentation - *spark.default.parallelism* For distributed shuffle operations likereduceByKey and join, the largest number of partitions in a parent RDD. For operations likeparallelize with no parent RDDs, it depends on the cluster manager - - *Others: total number of cores on all executor nodes or 2, whichever is larger* -- Thanks & Regards, Anshu Shukla