Dataflow runner and IntrinsicMapTaskExecutor instanciation

Michel Davit Fri, 20 May 2022 07:51:15 -0700

Hi there 👋

I have a question regarding dataflow runner, more precisely on its behavior
for instantiating IntrinsicMapTaskExecutor.


I've noticed that running the same job with different versions, and
analyzing the heap dump, there are some differences:
beam java SDK 2.29.0: *11* instances
beam java SDK 2.35.0: *45* instances
beam java SDK 2.35.0 with runner v2: *47* instances

On these test jobs, I run with only 1 worker of type n1-standard-1. In all
cases, only 4 task executors are started and assigned to a thread.

As creating a IntrinsicMapTaskExecutor is not 'free': it involves
duplicating the whole coder stack. We see a significant increase of memory
consumption as well as a longer startup time, due to the
CloudObjects.coderFromCloudObject operation.

Is there a reason why dataflow creates so many unused executors ?

Cheers
-- 



Michel Davit
Data Engineer
Spotify France | 54 Rue de Londres | 75008 Paris, France

Dataflow runner and IntrinsicMapTaskExecutor instanciation

Reply via email to