Heya, I need to send a group of messages, which are all related, and then process those messages, only when all of them have arrived.
Here is how I'm planning to do this. Is this the right way, and can any improvements be made to this? 1) Send a message to a topic called batch_start, with a batch id (which will be a UUID) 2) Post the messages to a topic called batch_msgs_<batch_id>. Here batch_id will be the batch id sent in batch_start. The number of messages sent will be recorded by the producer. 3) Send a message to batch_end with the batch id and the number of sent messages. 4) On the consumer side, using Kafka Streaming, I would listen to batch_end. 5) When the message there arrives, I will start another instance of Kafka Streaming, which will process the messages in batch_msgs_<batch_id> 6) Perhaps to be extra safe, whenever batch_end arrives, I will start a throwaway consumer which will just count the number of messages in batch_msgs_<batch_id>. If these don't match the # of messages specified in the batch_end message, then it will assume that the batch hasn't yet finished arriving, and it will wait for some time before retrying. Once the correct # of messages have arrived, THEN it will trigger step 5 above. Will the above method work, or should I make any changes to it? Is step 6 necessary?