Hi there 👋

I have a question regarding dataflow runner, more precisely on its behavior
for instantiating IntrinsicMapTaskExecutor.

I've noticed that running the same job with different versions, and
analyzing the heap dump, there are some differences:
beam java SDK 2.29.0: *11* instances
beam java SDK 2.35.0: *45* instances
beam java SDK 2.35.0 with runner v2: *47* instances

On these test jobs, I run with only 1 worker of type n1-standard-1. In all
cases, only 4 task executors are started and assigned to a thread.

As creating a IntrinsicMapTaskExecutor is not 'free': it involves
duplicating the whole coder stack. We see a significant increase of memory
consumption as well as a longer startup time, due to the
CloudObjects.coderFromCloudObject operation.

Is there a reason why dataflow creates so many unused executors ?

Cheers
-- 



Michel Davit
Data Engineer
Spotify France | 54 Rue de Londres | 75008 Paris, France

Reply via email to