Hi All, We have recently begun performance testing our Spark application and have found that changing the default parallelism has a much larger effect on the performance than expected, meaning there seems to be an illusive sweet spot that depends on the input size.
Does anyone have any idea of a good starting point to set the parralelism at depending on cluster spec and data size? Thanks Jem