Hi Krishnan,

Thanks for discussing this interesting scenario! 

It makes me remind of a previous pending improvement of adaptive load balance 
for rebalance partitioner. 
Since the rebalance mode can emit the data to any nodes without precision 
consideration, then the data can be emitted based on the current backlog of 
partition adaptively which can reflect the load condition of consumers somehow.

For your keyBy case, I guess the requirement is not only for the load balance 
of processing, but also for the consistency of preloaded cache.
Do you think it is possible to implement somehow custom partitioner which can 
control the logic of keyBy distribution based on pre-defined cache distribution 
in nodes? 

Best,
Zhijiang
------------------------------------------------------------------
From:Navneeth Krishnan <reachnavnee...@gmail.com>
Send Time:2020年9月23日(星期三) 02:21
To:user <user@flink.apache.org>
Subject:Adaptive load balancing

Hi All,

We are currently using flink in production and use keyBy for performing a CPU 
intensive computation. There is a cache lookup for a set of keys and since 
keyBy cannot guarantee the data is sent to a single node we are basically 
replicating the cache on all nodes. This is causing more memory problems for us 
and we would like to explore some options to mitigate the current limitations.

Is there a way to group a set of keys and send to a set of nodes so that we 
don't have to replicate the cache data on all nodes?

Has someone tried implementing hashing with adaptive load balancing so that if 
a node is busy processing then the data can be routed effectively to other 
nodes which are free.

Any suggestions are greatly appreciated.

Thanks

Reply via email to