Hi All,

We have recently begun performance testing our Spark application and have
found that changing the default parallelism has a much larger effect on the
performance than expected, meaning there seems to be an illusive sweet spot
that depends on the input size.

Does anyone have any idea of a good starting point to set the parralelism
at depending on cluster spec and data size?

Thanks

Jem

Reply via email to