John Roesler created KAFKA-10091:
------------------------------------

             Summary: Improve task idling
                 Key: KAFKA-10091
                 URL: https://issues.apache.org/jira/browse/KAFKA-10091
             Project: Kafka
          Issue Type: Task
          Components: streams
            Reporter: John Roesler


When Streams is processing a task with multiple inputs, each time it is ready 
to process a record, it has to choose which input to process next. It always 
takes from the input for which the next record has the least timestamp. The 
result of this is that Streams processes data in timestamp order. However, if 
the buffer for one of the inputs is empty, Streams doesn't know what timestamp 
the next record for that input will be.

Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address 
this issue.

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization]

The config allows Streams to wait some amount of time for data to arrive on the 
empty input, so that it can make a timestamp-ordered decision about which input 
to pull from next.

However, this config can be hard to use reliably and efficiently, since what 
we're really waiting for is the next poll that _would_ return data from the 
empty input's partition, and this guarantee is a function of the poll interval, 
the max poll interval, and the internal logic that governs when Streams will 
poll again.

The ideal case is you'd be able to guarantee at a minimum that _any_ amount of 
idling would guarantee you poll data from the empty partition if there's data 
to fetch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to