Hello team,

I'm currently working on a Flink use case where I need to calculate the sum
of occurrences for each "customer_id" within a 10-minute duration and send
the results to Kafka, associating each "customer_id" with its corresponding
count (e.g., 101:5).

In this scenario, my data source is a file, and I'm creating a keyed data
stream. With approximately one million entries in the file, I'm uncertain
about the optimal keying strategy. Specifically, I'm trying to decide
whether to use each customer_id directly as the key or to use the modulus
of 10 for the customer_id as the key. Could you please provide guidance on
which approach would yield better performance?

Thanks and regards,
Arjun

Reply via email to