Micro-batching in Kafka streams - redux

Shrijeet Paliwal Fri, 20 Oct 2017 12:37:01 -0700

Kafka version: 0.10.2.1 (can upgrade if needed)

I wish to revive the discussion around micro batching in Kafka streams.
Some past discussions are here <http://markmail.org/thread/zdxkvwt6ppq2xhv2>
& here <http://markmail.org/thread/un7dmn7pyk7eibxz>.


I am exploring ways to do at-least-once processing of events which are
handled in small batches as opposed to one at a time. A specific example is
to buffer mutation ops to a non-kafka sink and align the flushing of
batched ops with the offset commits.

The suggestions and workarounds that I have noticed in mailing lists are:

*a] Don't do it in Kafka streams, use Kafka connect. *

For the sake of this discussion, let's assume using kafka-connect isn't an
option.

*b] In Kafka streams, use a key value state store to micro batch and
perform a flush in punctuate method.*

The overhead seems nontrivial in this approach since a persistent key-value
store is backed by a topic which is compacted, the keys in the state store
will not be compaction friendly. For instance, if you use timestamp  & some
unique id combination as key and perform range scan to find ops buffered
since the last call to punctuate, the state store & backing Kafka topic
will grow unbounded. Any retention applied to state store or topic would
mean leaking implementation details, which makes this approach inelegant.

My question is since the last time this usecase was mentioned, has a better
pattern emerged?

--
Shrijeet

Micro-batching in Kafka streams - redux

Reply via email to