Kafka version: 0.10.2.1 (can upgrade if needed) I wish to revive the discussion around micro batching in Kafka streams. Some past discussions are here <http://markmail.org/thread/zdxkvwt6ppq2xhv2> & here <http://markmail.org/thread/un7dmn7pyk7eibxz>.
I am exploring ways to do at-least-once processing of events which are handled in small batches as opposed to one at a time. A specific example is to buffer mutation ops to a non-kafka sink and align the flushing of batched ops with the offset commits. The suggestions and workarounds that I have noticed in mailing lists are: *a] Don't do it in Kafka streams, use Kafka connect. * For the sake of this discussion, let's assume using kafka-connect isn't an option. *b] In Kafka streams, use a key value state store to micro batch and perform a flush in punctuate method.* The overhead seems nontrivial in this approach since a persistent key-value store is backed by a topic which is compacted, the keys in the state store will not be compaction friendly. For instance, if you use timestamp & some unique id combination as key and perform range scan to find ops buffered since the last call to punctuate, the state store & backing Kafka topic will grow unbounded. Any retention applied to state store or topic would mean leaking implementation details, which makes this approach inelegant. My question is since the last time this usecase was mentioned, has a better pattern emerged? -- Shrijeet