Seeking Guidance on Optimal Keying Strategy for Flink Use Case

arjun s Mon, 04 Dec 2023 07:24:53 -0800

Hello team,

I'm currently working on a Flink use case where I need to calculate the sum
of occurrences for each "customer_id" within a 10-minute duration and send
the results to Kafka, associating each "customer_id" with its corresponding
count (e.g., 101:5).


In this scenario, my data source is a file, and I'm creating a keyed data
stream. With approximately one million entries in the file, I'm uncertain
about the optimal keying strategy. Specifically, I'm trying to decide
whether to use each customer_id directly as the key or to use the modulus
of 10 for the customer_id as the key. Could you please provide guidance on
which approach would yield better performance?

Thanks and regards,
Arjun

Seeking Guidance on Optimal Keying Strategy for Flink Use Case

Reply via email to