John Roesler created KAFKA-8478:
-----------------------------------
Summary: Poll for more records before forced processing
Key: KAFKA-8478
URL: https://issues.apache.org/jira/browse/KAFKA-8478
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: John Roesler
While analyzing the algorithm of Streams's poll/process loop, I noticed the
following:
The algorithm of runOnce is:
{code}
loop0:
long poll for records (100ms)
loop1:
loop2: for BATCH_SIZE iterations:
process one record in each task that has data enqueued
adjust BATCH_SIZE
if loop2 processed any records, repeat loop 1
else, break loop1 and repeat loop0
{code}
There's potentially an unwanted interaction between "keep processing as long as
any record is processed" and forcing processing after `max.task.idle.ms`.
If there are two tasks, A and B, and A runs out of records on one input before
B, then B could keep the processing loop running, and hence prevent A from
getting any new records, until max.task.idle.ms expires, at which point A will
force processing on its other input partition. The intent of idling is to at
least give A a chance of getting more records on the empty input, but under
this situation, we'd never even check for more records before forcing
processing.
I'm thinking we should only enforce processing if there was a completed poll
since we noticed the task was missing inputs (otherwise, we may as well not
bother idling at all).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)