hi, I have a streaming job and quite often executors die (due to memory errors/ "unable to find location for shuffle etc) during the processing. I started digging and found that some of the tasks are concentrated to one executor, just as below: [image: image.png]
Can this be the reason? Should I repartition the underlying data before I execute a groupby on the top of it? Any advice is welcome Thanks Andras