Re: Weird performance on custom Hashjoin w.r.t. parallelism

2017-11-09 Thread Piotr Nowojski
Hi, Yes as you correctly analysed parallelism 1 was causing problems, because it meant that all of the records must been gathered over the network from all of the task managers. Keep in mind that even if you increase parallelism to ā€œpā€, every change in parallelism can slow down your application

Re: Weird performance on custom Hashjoin w.r.t. parallelism

2017-11-08 Thread m@xi
Hello! I found out that the cause of the problem was the map that I have after the parallel join with parallelism 1. When I changed it to .map(new MyMapMeter).setParallelism(p) then when I increase the number of parallelism p the completion time decreases, which is reasonable. Somehow it was a bot