Hi,

We have a KStream and a KTable that we are left-joining. The KTable has a
"backlog" of records that we want to consume before any of the entries in
the KStream is processed. To guarantee that, we have played with the
timestamp extraction, setting the time for those records in the "distant"
past to guarantee they will be consumed before any of the records in the
KStream are processed.

This is working as expected, forcing the KTable to be ingested before the
KStream. However, an unexpected side effect that we have noticed is this
"delayed" is only applied once when the application starts. Going through
the code in StreamTask method (
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L333-L351)
it seems that the problem is the fact that, once all records in all
partitions are consumed, idleStartTime is not set back to UNKNOWN. That
means that the first record that arrives through any of the two partitions,
will be immediately processed.

Is this by design?

Thanks.

Reply via email to