Hi,

using keyBy Flink ensures that every set of records with same key is send to the same operator, otherwise it would not be possible to process them as a whole. It depends on your use case if it is also ok that another operator processes parts of this set of records. You can implement you own partition strategy to split your data more evenly (https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#physical-partitioning). But this depends on your knowledge of key spaces and load, Flink can not know this in advance.

I hope that helps.

Regards,
Timo


Am 20/03/17 um 15:29 schrieb Sonex:
I am using a simple streaming job where I use keyBy on the stream to process
events per key. The keys may vary in number (few keys to thousands). I have
noticed a behavior of Flink and I need clarification on that. When we use
keyBy on the stream, flink assigns keys to parallel operators so each
operator can handle events per key independently. Once a key is assigned to
an operator, can the key change the operator on which it is assigned? From
what I`ve seen the answer is no.

For example, let`s assume that keys 1 and 2 are assigned to operator A and
keys 3 and 4 are assigned to operator B. If there is a burst of data for key
1 at some later time point, but keys 2,3 and 4 have only few data will key 2
be assigned to operator B to balance the load? If not is there a way to do
that? And again if not, why flink does not do that?



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/load-balancing-of-keys-to-operators-tp12303.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Reply via email to