lucasbru opened a new pull request, #19818:
URL: https://github.com/apache/kafka/pull/19818

   There is a sequence of interactions with the membership managers of KIP-848, 
KIP-932, KIP-1071 that can put the member ship manager into JOINING state, but 
where member epoch is set to -1. This can result in an invalid request being 
sent, since joining heartbeats should not have member epoch -1. This may lead 
to the member failing to join. In the case of streams, the group coordinator 
will return INVALID_REQUEST.
   
   This is the sequence triggering the bug, which seems to relatively likely, 
caused by two heartbeat responses being received after the next one has been 
sent.
   
   `membershipManager.leaveGroup(); -> transitions to LEAVING 
membershipManager.onHeartbeatRequestGenerated(); -> transitions to UNSUBSCRIBED 
membershipManager.onHeartbeatSuccess(... with member epoch > 0); -> unblocks 
the consumer membershipManager.onSubscriptionUpdated(); 
membershipManager.onConsumerPoll(); -> transitions to JOINING 
membershipManager.onHeartbeatSuccess(... with member epoch < 0); -> updates the 
epoch to a negative value -> Now we are in state JOINING with memberEpoch -1, 
and the next heartbeat we send will be malformed, triggering INVALID_REQUEST`
   
   The bug may also be triggered if the `unsubscribe` times out, but this seems 
more of a corner case.
   
   To prevent the bug, we are taking two measures: The likely path to 
triggering the bug can be prevented by not unblocking an `unsubscribe` call in 
the consumer when a non-leave-heartbeat epoch is received. Once we have sent 
out leave group heartbeat, we will ignore all heartbeats, except for those 
containing memberEpoch < 0.
   
   For extra measure, we also prevent the second case (`unsubscribe` timing 
out). In this case, the consumer gets unblocked before we have received the 
leave group heartbeat response, and may resubscribe to the group. In this case, 
we shall just ignore the heartbeat response that contains a member epoch < 0, 
once it arrives and we have already left the `UNSUBSCRIBED` state.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to