One detail, even forcing partitions (/repartition/), spark is still holding
some tasks; if I increase the load of the system (increasing
/spark.streaming.receiver.maxRate/), even if all workers are used, the one
with the receiver gets twice as many tasks compared with the other workers.
Total del
Thanks Akhil Das-2: actually I tried setting spark.default.parallelism but no
effect :-/
I am running standalone and performing a mix of map/filter/foreachRDD.
I had to force parallelism with repartition to get both workers to process
tasks, but I do not think this should be required (and I am n
What operation are you performing? And what is your cluster configuration?
If you are doing some operation like groupBy, reduceBy, join etc then you
could try providing the level of parallelism. if you give 16, then mostly
each of your worker will get 8 tasks to execute.
Thanks
Best Regards
On Mo