Hello team, I'm currently working on a Flink use case where I need to calculate the sum of occurrences for each "customer_id" within a 10-minute duration and send the results to Kafka, associating each "customer_id" with its corresponding count (e.g., 101:5).
In this scenario, my data source is a file, and I'm creating a keyed data stream. With approximately one million entries in the file, I'm uncertain about the optimal keying strategy. Specifically, I'm trying to decide whether to use each customer_id directly as the key or to use the modulus of 10 for the customer_id as the key. Could you please provide guidance on which approach would yield better performance? Thanks and regards, Arjun