Hi Murad,

On Wed, 26 Apr 2017 at 13:37 Murad Mamedov <m...@muradm.net> wrote:

> Is there any global time synchronization between streams in Kafka Streams
> API? So that, it would not consume more events from one stream while the
> other is still behind in time. Or probably better to rephrase it like, is
> there global event ordering based on timestamp of event?
>

Yes. When streams are joined each partition from the joined streams are
grouped together into a single Task. Each Task maintains a record buffer
for all of the topics it is consuming from. When it is time process a
record it will chose a record from the partition that has the smallest
timestamp. So in this way it makes a best effort to keep the streams in
sync.


>
> The other thing could be to join streams in window, however same question
> arises, if one stream days behind the other, will the join window of 15
> minutes ever work?
>
>
If the data is arriving much later you can use
JoinWindows.until(SOME_TIME_PERIOD) to keep the data around. In this case
the streams will still join. Once SOME_TIME_PERIOD has expired the streams
will no longer be able to join.


> I'm trying to grasp a way on how to design replay of long periods of time
> for application with multiple topics/streams. Especially when combining
> with low-level API processors and transformers which relay on each other
> via GlobalKTable or KTable stores on these streams. For instance, smaller
> topic could have the following sequence of events:
>
> T1 - (k1, v1)
> T1 + 10 minutes - (k1, null)
> T1 + 20 minutes - (k1, v2)
>
> While topic with larger events:
>
> T1 - (k1, vt1)
> T1 + 5 minutes - (k1, null)
> T1 + 15 minutes - (k1, vt2)
>
> If one would join or lookup these streams in realtime (timestamp of event
> is approximately = wall clock time) result would be:
>
> T1 - topic_small (k1, v1) - topic_large (k1, vt1)
> T1 + 5 minutes - topic_small (k1, v1) - topic_large (k1, null)
> T1 + 10 minutes - topic_small (k1, null) - topic_large (k1, null)
> T1 + 15 minutes - topic_small (k1, null) - topic_large (k1, vt2)
> T1 + 20 minutes - topic_small (k1, v2) - topic_large (k1, vt2)
>
> However, when replaying streams from beginning, from perspective of topic
> with large events, it would see topic with small events as (k1, v2),
> completely missing v1 and null states in case of GlobalKTable/KTable
> presentation or events in case of KStream-KStream windowed join.
>
>
I don't really follow here. In the case of a GlobalKTable it will be
initialized with all of the existing data before the rest of the streams
start processing.


> Do I miss something here? Should application be responsible in global
> synchronization between topics, or Kafka Streams does / can do that? If
> application should, then what could be approach to solve it?
>
> I hope I could explain myself.
>
> Thanks in advance
>

Reply via email to