Flink performance with multiple operators reshuffling data

Jason Liu Mon, 30 Aug 2021 17:12:50 -0700

Hi there,

    We have this use case where we need to have multiple keybys operators
with its own MapState, all with different keys, in a single Flink app. This
obviously means we'll be reshuffling our data a lot.
    Our TPS is around 1-2k, with ~2kb per event and we use Kinesis Data
Analytics as the infrastructure (running roughly on ~128 KPU of hardware).
I'm currently in the design phase of this system and just wondering if we
can put the data through 4-5 keyed process functions all with different key
bys and if it can be scalable with a large enough Flink cluster. I don't
think we can get around this requirement much (other than replicating
data). Alternatively, we can just run multiple small Flink clusters, each
with its own unique keyBys but I'm not sure if or how much that'll help.
     Thanks for any potential insights!


-Jason

Flink performance with multiple operators reshuffling data

Reply via email to