Hi Navneeth, The "keyby" semantics must keep the data under same key into same task. So basically this data skew issue is caused by your data distribution. As far as I known, Flink could not handle data skew very well. There is a proposal about local aggregation which is still under discussion in dev mailing list. It can alleviate the data skew. But I guess it still need some time.
As Caizhi mentioned, it's better to do something in user codes as a workaround solution. For example, redistribute the skew data. Navneeth Krishnan <reachnavnee...@gmail.com> 于2019年7月15日周一 下午2:38写道: > Hi All, > > Currently I have a keyBy user and I see uneven load distribution since > some of the users would have very high load versus some users having very > few messages. Is there a recommended way to achieve even distribution of > workload? Has someone else encountered this problem and what was the > workaround? > > Thanks >