chickenchickenlove opened a new pull request, #19631: URL: https://github.com/apache/kafka/pull/19631
### Background There has been the issue that events skipped in group rebalancing (https://github.com/spring-projects/spring-kafka/issues/3703) . At the first, I thought it was caused from `spring-kafka`. However, After digging into the problem with debug, I concluded it was a race condition issue in Kafka. A race condition between the main thread and the consumer coordinator's heartbeat thread exists when the main thread attempts to commit via `commitSync(...)` while the consumer coordinator thread is handling consumer group rebalancing. For more details, please refer to sequence diagram below. ```mermaid sequenceDiagram actor m as main_thread participant s as State actor h as kafka_cooridnator_thread actor b as broker b ->> h: Response of JoinGroup h ->> s: Update State to COMPLETING_REBALANCE s ->> s: STATE: COMPLETING_REBALNCE b ->> h: Response of SyncGroup m ->> m: Call commitSync() m ->> s: Call generationIfStable() s -->> m: generationId=null because State = COMPLETING_REBALANCE h ->> s: Update State to STABLE s ->> s: STATE: STABLE m ->> s: Call rebalanceInProgress() s -->> m: False. Because State = STABLE m ->> m : Throw CommitFailedException ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org