It may not be ideal, but there is a way to prioritize particular topics. It
is to set the record timestamps to zero. This can be done by using a custom
TimestampExtractor. Kafka Streams tries to synchronize multiple streams
using the extracted timestamps. So, records with the timestamp 0 have
greater chance to be processed earlier than others.

On Thu, Mar 24, 2016 at 6:57 PM, Greg Fodor <gfo...@gmail.com> wrote:

> Really digging Kafka Streams so far, nice work all. I'm interested in
> being able to materialize one or more KTables in full before the rest
> of the topology begins processing messages. This seems fundamentally
> useful since it allows you to get your database tables replicated up
> off the change stream topics from Connect before the stream processing
> workload starts.
>
> In Samza we have bootstrap streams and stream prioritization to help
> facilitate this. What seems desirable for Kafka Streams is:
>
> - Per-source prioritization (by defaulting to >0, setting the stream
> priority to 0 effectively bootstraps it.)
> - Per-source initial offset settings (earliest or latest, default to
> latest)
>
> To solve the KTable materialization problem, you'd set priority to 0
> for its source and the source offset setting to earliest.
>
> Right now it appears the only control you have for re-processing is
> AUTO_OFFSET_RESET_CONFIG, but I believe this is a global setting for
> the consumers, and hence, the entire job. Beyond that, I don't see any
> way to prioritize stream consumption at all, so your KTables will be
> getting materialized while the general stream processing work is
> running concurrently.
>
> I wanted to see if this case is actually supported already and I'm
> missing something, or if not, if these options make sense. If this
> seems reasonable and it's not too complicated, I could possibly try to
> get together a patch. If so, any tips on implementing this would be
> helpful as well. Thanks!
>
> -Greg
>

Reply via email to