[jira] [Created] (KAFKA-12226) High-throughput source tasks fail to commit offsets

Chris Egerton (Jira) Tue, 19 Jan 2021 12:40:07 -0800

Chris Egerton created KAFKA-12226:
-------------------------------------

             Summary: High-throughput source tasks fail to commit offsets
                 Key: KAFKA-12226
                 URL: https://issues.apache.org/jira/browse/KAFKA-12226
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
            Reporter: Chris Egerton
            Assignee: Chris Egerton



The current source task thread has the following workflow:
 # Poll messages from the source task

 # Queue these messages to the producer and send them to Kafka asynchronously.

 # Add the message to outstandingMessages, or if a flush is currently active, 
outstandingMessagesBacklog

 # When the producer completes the send of a record, remove it from 
outstandingMessages

The commit offsets thread has the following workflow:
 # Wait a flat timeout for outstandingMessages to flush completely

 # If this times out, add all of the outstandingMessagesBacklog to the 
outstandingMessages and reset

 # If it succeeds, commit the source task offsets to the backing store.

 # Retry the above on a fixed schedule

If the source task is producing records quickly (faster than the producer can 
send), then the producer will throttle the task thread by blocking in its 
{{send}} method, waiting at most {{max.block.ms}} for space in the 
{{buffer.memory}} to be available. This means that the number of records in 
{{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to the 
size of the producer memory buffer.

This amount of data might take more than {{offset.flush.timeout.ms}} to flush, 
and thus the flush will never succeed while the source task is rate-limited by 
the producer memory. This means that we may write multiple hours of data to 
Kafka and not ever commit source offsets for the connector. When the task is 
lost due to a worker failure, hours of data will be re-processed that otherwise 
were successfully written to Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12226) High-throughput source tasks fail to commit offsets

Reply via email to