Hi Navneeth,
The "keyby" semantics must keep the data under same key into same task. So
basically this data skew issue is caused by your data distribution.
As far as I known, Flink could not handle data skew very well. There is a
proposal about local aggregation which is still under discussion in
Hi Navneeth,
Is it possible for you to first keyBy something other than user id (for
example, message id), and then aggregate the message of the same user in
the same keyed stream, and finally aggregate all the keyed stream to get a
per-user result?
Navneeth Krishnan 于2019年7月15日周一 下午2:38写道:
> H
Hi All,
Currently I have a keyBy user and I see uneven load distribution since some
of the users would have very high load versus some users having very few
messages. Is there a recommended way to achieve even distribution of
workload? Has someone else encountered this problem and what was the
wor