Justine Olshan created KAFKA-14920: -------------------------------------- Summary: Address timeouts and out of order sequences Key: KAFKA-14920 URL: https://issues.apache.org/jira/browse/KAFKA-14920 Project: Kafka Issue Type: Sub-task Reporter: Justine Olshan Assignee: Justine Olshan
KAFKA-14844 showed the destructive nature of a timeout on the first produce request for a topic partition (ie one that has no state in psm) Since we currently don't validate the first sequence (we will in part 2 of kip-890), any transient error on the first produce can lead to out of order sequences that never recover. Originally, KAFKA-14561 relied on the producer's retry mechanism for these transient issues, but until that is fixed, we may need to retry from in the AddPartitionsManager instead. We addressed the concurrent transactions, but there are other errors like coordinator loading that we could run into and see increased out of order issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)