I don't - it would require fetching all messages and iterating over them just to count them, which is expensive. I know the counts after they have been sent.
On Sun, Dec 4, 2016 at 9:34 PM, Marko Bonaći <marko.bon...@sematext.com> wrote: > Do you know in advance (when sending the first message) how many messages > that batch is going to have? > > > Marko Bonaći > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > Solr & Elasticsearch Support > Sematext <http://sematext.com/> | Contact > <http://sematext.com/about/contact.html> > > On Sat, Dec 3, 2016 at 1:01 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > > Hey Apurva, > > > > I am including the batch_id inside the messages. > > > > Could you give me an example of what you mean by custom control messages > > with a control topic please? > > > > > > > > On Sat, Dec 3, 2016 at 12:35 AM, Apurva Mehta <apu...@confluent.io> > wrote: > > > > > That should work, though it sounds like you may be interested in : > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging > > > > > > If you can include the 'batch_id' inside your messages, and define > custom > > > control messages with a control topic, then you would not need one > topic > > > per batch, and you would be very close to the essence of the above > > > proposal. > > > > > > Thanks, > > > Apurva > > > > > > On Fri, Dec 2, 2016 at 5:02 AM, Ali Akhtar <ali.rac...@gmail.com> > wrote: > > > > > > > Heya, > > > > > > > > I need to send a group of messages, which are all related, and then > > > process > > > > those messages, only when all of them have arrived. > > > > > > > > Here is how I'm planning to do this. Is this the right way, and can > any > > > > improvements be made to this? > > > > > > > > 1) Send a message to a topic called batch_start, with a batch id > (which > > > > will be a UUID) > > > > > > > > 2) Post the messages to a topic called batch_msgs_<batch_id>. Here > > > batch_id > > > > will be the batch id sent in batch_start. > > > > > > > > The number of messages sent will be recorded by the producer. > > > > > > > > 3) Send a message to batch_end with the batch id and the number of > sent > > > > messages. > > > > > > > > 4) On the consumer side, using Kafka Streaming, I would listen to > > > > batch_end. > > > > > > > > 5) When the message there arrives, I will start another instance of > > Kafka > > > > Streaming, which will process the messages in batch_msgs_<batch_id> > > > > > > > > 6) Perhaps to be extra safe, whenever batch_end arrives, I will > start a > > > > throwaway consumer which will just count the number of messages in > > > > batch_msgs_<batch_id>. If these don't match the # of messages > specified > > > in > > > > the batch_end message, then it will assume that the batch hasn't yet > > > > finished arriving, and it will wait for some time before retrying. > Once > > > the > > > > correct # of messages have arrived, THEN it will trigger step 5 > above. > > > > > > > > Will the above method work, or should I make any changes to it? > > > > > > > > Is step 6 necessary? > > > > > > > > > >