Delayed consumer in a Streams topology

Marcos Juarez Thu, 13 Apr 2017 16:04:03 -0700

I'm building a prototype with Kafka Streams that will be consuming from the
same topic twice, once with no delay, just like any normal consumer, and
once with a 60 minute delay, using the new timestamp-per-message field.  It
will also store state coming from other topics that are being read
simultaneously.


The reason why I'm consuming twice from the same topic, with one of them
delayed, is that our processor needs to know, for any particular record, if
there are any more records related to that coming within the next 60
minutes, and change some of the fields accordingly, before sending them
down as the final version of that event.

Everything seems to be supported by the normal Streams DSL, except for a
way to specify a consumption delay from a particular source topic.

I know I can just create a delayed topic myself, and then consume from
that, but the topic volume in this case prevents us from doing this.  I
don't want to duplicate the data and load in the cluster.  This topic
currently handles ~2B records per day, and will grow in the future to
potentially 10B records per day.

Any ideas on how I could handle this using Kafka Streams?  Or if it's
better to use the lower level Streams API for that, can you point me to a
starting point?  I've been reading docs and javadocs for a while now, and
I'm not sure where I would add/configure this.

Thanks!

Marcos Juarez

Delayed consumer in a Streams topology

Reply via email to