Re: Flink performance with multiple operators reshuffling data

Caizhi Weng Mon, 30 Aug 2021 19:55:28 -0700

Hi!

Key-by operations can scale with parallelisms. Flink will shuffle your
record to different sub-task according to the hash value of the key modulo
number of parallelism, so the more parallelism you have the faster Flink
can process data, unless there is a data skew.


Jason Liu <jasonli...@ucla.edu> 于2021年8月31日周二 上午8:12写道：

> Hi there,
>
>     We have this use case where we need to have multiple keybys operators
> with its own MapState, all with different keys, in a single Flink app. This
> obviously means we'll be reshuffling our data a lot.
>     Our TPS is around 1-2k, with ~2kb per event and we use Kinesis Data
> Analytics as the infrastructure (running roughly on ~128 KPU of hardware).
> I'm currently in the design phase of this system and just wondering if we
> can put the data through 4-5 keyed process functions all with different key
> bys and if it can be scalable with a large enough Flink cluster. I don't
> think we can get around this requirement much (other than replicating
> data). Alternatively, we can just run multiple small Flink clusters, each
> with its own unique keyBys but I'm not sure if or how much that'll help.
>      Thanks for any potential insights!
>
> -Jason
>

Re: Flink performance with multiple operators reshuffling data

Reply via email to