Lianet Magrans created KAFKA-16954:
--------------------------------------

             Summary: Move consumer leave operations on close to background 
thread
                 Key: KAFKA-16954
                 URL: https://issues.apache.org/jira/browse/KAFKA-16954
             Project: Kafka
          Issue Type: Bug
            Reporter: Lianet Magrans


When a consumer unsubscribes, the app thread simply triggers an Unsubscribe 
event that will take care of it all in the background thread: release 
assignment (callbacks), clear assigned partitions, and send leave group HB.

On the contrary, when a consumer is closed, these actions happen in both 
threads:
 * release assignment -> in the app thread by directly running the callbacks
 * clear assignment -> in app thread by updating the subscriptionState
 * send leave group HB -> in the background thread via an event LeaveOnClose 

This situation could lead to race conditions, mainly because of the close 
updating the subscription state in the app thread, when other operations in the 
background could be already running based on it. Ex. 
 * unsubscribe in app thread (triggers background UnsubscribeEvent to revoke 
and leave)
 * unsubscribe fails (ex. interrupted, leaving operation running in the 
background thread to revoke partitions and leave)
 * consumer close (will revoke and clear assignment in the app thread)
 *  UnsubscribeEvent in the background may fail by trying to revoke partitions 
that it does not own anymore - _No current assignment for partition ..._

A basic check has been added to the background thread revocation to avoid the 
race condition, ensuring that we only revoke partitions we own, but still we 
should avoid the root cause, which is updating the assignment on the app 
thread. We should consider having the close operation as a single LeaveOnClose 
event handled in the background. That even already takes cares of revoking the 
partitions and clearing assignment on the background, so no need to take care 
of it in the app thread. We should only ensure that we processBackgroundEvents 
until the LeaveOnClose completes (to allow for callbacks to run in the app 
thread)

 

Trying to understand the current approach, I imagine the initial motivation to 
have the callabacks (and assignment cleared) in the app thread was to avoid the 
back-and-forth: app thread close -> background thread leave event -> app thread 
to run callback -> background thread to clear assignment and send HB. But 
updating the assignment on the app thread ends up being problematic, as it 
mainly happens in the background so it opens up the door for race conditions on 
the subscription state. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to