Hi Alexandre, thanks for your reply. I updated my example to produce a
message to the compacted topic ("table") at a lower timestamp, but I'm
still not getting the expected result unfortunately. The scenario is now:

1. Message published to A ("input") with timestamp t_0
2. Three seconds of wall clock time elapses
3. Message published to B ("table") with timestamp (t_0 - 1)



On Tue, Aug 30, 2022 at 12:15 AM Alexandre Brasil <
alexandre.bra...@gmail.com> wrote:

> Hi Derek,
>
> What max.task.idle.ms does is set a wait time for the stream application
> to
> wait for new messages when one or more
> input topics have no messages after a poll. In your case, the application
> polls for the first time and finds a message on
> topic A ("input") and no messages on topic B ("table"). Since you have
> max.task.idle.ms set to 5000, it waits up to five
> seconds for messages to arrive on topic B. When you produce your second
> message three seconds later, the app will
> process the messages.
>
> Kafka Streams will process the messages from both topics in timestamp
> order, but since both of your messages have
> the same timestamp, my guess is that it's processing the message from the
> stream ("input") first and it finds nothing to
> join to on "table". My guess is that if you tweak the second message
> timestamp to be lower than the timestamp of the
> second message you'll get the result you want.
>
> Regards,
> Alexandre
>
> On Mon, Aug 29, 2022 at 11:55 AM Derek Mok <derek.mok9...@gmail.com>
> wrote:
>
> > Hi, I'd like some help with understanding how exactly max.task.idle.ms
> > works. I have a topology that consumes from an input topic A, and a join
> > operator that enriches the topic A messages with a KTable from a
> compacted
> > topic B. The enriched messages are output to topic C.
> >
> > If I set max.task.idle.ms to 5000, what should be the expected behaviour
> > in
> > the following scenario (assuming all messages have the same key and
> topics
> > have same partition count):
> >
> >    1. Message published to A with timestamp t_0
> >    2. Three seconds of wall clock time elapses
> >    3. Message published to B with timestamp t_0
> >
> > My understanding is that a message should be output to topic C containing
> > the enriched result after max.task.idle.ms elapses since it should
> account
> > for the late producer of topic B. The join operator should only be
> invoked
> > after max.task.idle.ms has elapsed. However, what actually happens is
> that
> > nothing is published in topic C. I suspect my understanding of
> > max.task.idle.ms is not entirely correct so would appreciate any insight
> > here!
> >
> > Here is a small project demonstrating the above
> > https://github.com/ThousandEyes-Derek/max-task-idle-ms
> >
> > MaxTaskIdleStreamsApp contains the streams application that describes the
> > topology.
> > MessageProducer produces messages into the input and compacted topics
> based
> > on the scenario I described above.
> >
> > Thanks!
> > Derek
> >
>

Reply via email to