Hi! Key-by operations can scale with parallelisms. Flink will shuffle your record to different sub-task according to the hash value of the key modulo number of parallelism, so the more parallelism you have the faster Flink can process data, unless there is a data skew.
Jason Liu <jasonli...@ucla.edu> 于2021年8月31日周二 上午8:12写道: > Hi there, > > We have this use case where we need to have multiple keybys operators > with its own MapState, all with different keys, in a single Flink app. This > obviously means we'll be reshuffling our data a lot. > Our TPS is around 1-2k, with ~2kb per event and we use Kinesis Data > Analytics as the infrastructure (running roughly on ~128 KPU of hardware). > I'm currently in the design phase of this system and just wondering if we > can put the data through 4-5 keyed process functions all with different key > bys and if it can be scalable with a large enough Flink cluster. I don't > think we can get around this requirement much (other than replicating > data). Alternatively, we can just run multiple small Flink clusters, each > with its own unique keyBys but I'm not sure if or how much that'll help. > Thanks for any potential insights! > > -Jason >