It may not be ideal, but there is a way to prioritize particular topics. It is to set the record timestamps to zero. This can be done by using a custom TimestampExtractor. Kafka Streams tries to synchronize multiple streams using the extracted timestamps. So, records with the timestamp 0 have greater chance to be processed earlier than others.
On Thu, Mar 24, 2016 at 6:57 PM, Greg Fodor <gfo...@gmail.com> wrote: > Really digging Kafka Streams so far, nice work all. I'm interested in > being able to materialize one or more KTables in full before the rest > of the topology begins processing messages. This seems fundamentally > useful since it allows you to get your database tables replicated up > off the change stream topics from Connect before the stream processing > workload starts. > > In Samza we have bootstrap streams and stream prioritization to help > facilitate this. What seems desirable for Kafka Streams is: > > - Per-source prioritization (by defaulting to >0, setting the stream > priority to 0 effectively bootstraps it.) > - Per-source initial offset settings (earliest or latest, default to > latest) > > To solve the KTable materialization problem, you'd set priority to 0 > for its source and the source offset setting to earliest. > > Right now it appears the only control you have for re-processing is > AUTO_OFFSET_RESET_CONFIG, but I believe this is a global setting for > the consumers, and hence, the entire job. Beyond that, I don't see any > way to prioritize stream consumption at all, so your KTables will be > getting materialized while the general stream processing work is > running concurrently. > > I wanted to see if this case is actually supported already and I'm > missing something, or if not, if these options make sense. If this > seems reasonable and it's not too complicated, I could possibly try to > get together a patch. If so, any tips on implementing this would be > helpful as well. Thanks! > > -Greg >