Hi Everyone

I have deployed a Flink cluster using a Flink Kubernetes operator and then
submitted an Apache Beam Pipeline using a FlinkRunner.

I submitted two jobs. One with *parallelism=20* and another with
*parallelism=1* but both jobs took almost the same time to complete the
task (A difference of a few seconds), which is very surprising.  In Flink
UI, I can see the parallelism is set to 20 and each task has 20 sub-tasks
but 19 subtasks are finishing the execution in a few seconds to minutes and
only one subtask is running for the majority of the time.

I have attached a screenshot below, where one subtask took nearly 1 hour 14
minutes, and the remaining 19 subtasks took less than 2 minutes to
complete. So I am not getting any benefit of parallelism here. The task is
simple and does not rely on any state still it's not using the resources to
parallelize the work. * Is there any way to force parallelism here?  *

[image: image.png]

The task has multiple steps and the final step is to write the output to
the bucket, the output is written to multiple files so the task can be
parallelized but only one subtask is doing the actual job.


[image: image.png]


Can someone help me figure out the right configuration and setup needed to
parallelize the work?

Regards
Dipak

Reply via email to