Hi Navneeth,

The "keyby" semantics must keep the data under same key into same task. So
basically this data skew issue is caused by your data distribution.
As far as I known, Flink could not handle data skew very well. There is a
proposal about local aggregation which is still under discussion in dev
mailing list. It can alleviate the data skew. But I guess it still need
some time.

As Caizhi mentioned, it's better to do something in user codes as a
workaround solution. For example, redistribute the skew data.


Navneeth Krishnan <reachnavnee...@gmail.com> 于2019年7月15日周一 下午2:38写道:

> Hi All,
>
> Currently I have a keyBy user and I see uneven load distribution since
> some of the users would have very high load versus some users having very
> few messages. Is there a recommended way to achieve even distribution of
> workload? Has someone else encountered this problem and what was the
> workaround?
>
> Thanks
>

Reply via email to