Re: KAFKA-8104: Help with the review

Nikolay Izhikov Mon, 14 Oct 2019 12:46:47 -0700

Hello.

I got very helpfull advices from guozhang.
And now, we have a ready fix and reproducer.


This PR fixes a very long living Kafka Consumer bug.
Please, join to the review.

[1] https://issues.apache.org/jira/browse/KAFKA-8104
[2] https://github.com/apache/kafka/pull/7460

В Пн, 07/10/2019 в 21:37 +0300, Nikolay Izhikov пишет:
> Hello.
> 
> We have KAFKA-8104 "Consumer cannot rejoin to the group after rebalancing" 
> [1] issue.
> It reproduces on many production environments.
> 
> I prepared reproducer and fix [2] for this issue.
> But, I need assistance with the "fair" reproducer.
> 
> Please, help me with the review and "fair" reproducer:
> 
> PR contains the fix of race condition bug between "consumer thread" and 
> "consumer coordinator heartbeat thread". It reproduces in many production 
> environments.
> 
> Condition for reproducing:
> 
> 1. Consumer thread initiates rejoin to the group because of commit timeout. 
> Call of `AbstractCoordinator#joinGroupIfNeeded` which leads to 
> `sendJoinGroupRequest`.
> 2. `JoinGroupResponseHandler` writes to the 
> `AbstractCoordinator.this.generation` new generation data and leaves the` 
> synchronized` section.
> 3. Heartbeat thread executes `mabeLeaveGroup` and clears generation data via 
> `resetGenerationOnLeaveGroup`.
> 4. Consumer thread executes `onJoinComplete(generation.generationId, 
> generation.memberId, generation.protocol, memberAssignment);` with the 
> cleared generation data. This leads to the corresponding
> exception.
> 
> The race fixed with the condition in `maybeLeaveGroup`: if we have ongoing 
> rejoin process in consumer thread there is no reason to reset generation data 
> and send `LeaveGroupRequest` in heartbeat
> thread.
> 
> This PR contains unfair "reproducer".
> It implemented with the `CountDownLatch` that imitates described race in 
> `AbstractCoordinator` code.
> 
> 
> 
> [1] https://issues.apache.org/jira/browse/KAFKA-8104
> [2] https://github.com/apache/kafka/pull/7460

signature.asc
Description: This is a digitally signed message part

Re: KAFKA-8104: Help with the review

Reply via email to