Hi Anny,
Much more partitions is not recommended in general as that creates a lot of
small tasks. All the tasks needs to send to worker nodes for execution.
Too many partitions increases task scheduling overhead.
Spark uses synchronous execution model which means that all tasks in a
stage need to
A "task" is the work to be done on a partition for a given stage - you
should expect the number of tasks to be equal to the number of partitions
in each stage, though a task might need to be rerun (due to failure or need
to recompute some data).
2-4 times the cores in your cluster should be a good