Can some one help me in figuring out how to implement in flink. I have to create a pipeline Kafka->flink->elasticsearch. I have high throughput data coming into Kafka. All messages in Kafka have a key called 'id' and value is a integer that ranges 1 to N. N is dynamic with max value as 100. The number of messages across different ID's are drastically different. For eg. Number of incoming messages with id 10 can be 500 times the number of incoming messages with id 11. One requirement is that messages with a particular id has to be written to a corresponding elasticsearch index. Eg. Messages with id 1 is written to elasticsearch index 1, Messages with id 2 is written to elasticsearch index 2 and so on. ... In other words, there will be 100 elasticsearch indices at most.
I have the control over Kafka. I can make sure that messages are written to a single topic or messages are separately written to different topics based on their ids. The only requirement is that messages are written to indices that correspond to the ids. 1. What are the possible ways that I can achieve this in Flink? 2. If I use a single kafka topic and a single flink job, what is the best way to group ids in this case and set parallelism according to the distribution of data.? The parallelism required to write into ES is going to be different for different ids(as i said earlier, distribution of data across ids are drastically different). 3. If i have a Kafka topic per id and a topology per id looks ugly and too resource intensive. There are some ids that have very very few data. What is the best way to do this if we were to choose this option ?