guozhangwang commented on pull request #9354:
URL: https://github.com/apache/kafka/pull/9354#issuecomment-701557127


   > One thing I don't yet understand is, why did this affect all three threads 
on a single instance (at about the same time)? Was it just because the instance 
in question didn't have caught-up state for any active tasks, thus it was only 
assigned stateless tasks across all three threads?
   > 
   > Also, if the thread is only assigned stateless tasks, then shouldn't it 
reach RUNNING and therefore start to call `commit` _earlier_ than a thread with 
some stateful tasks? But we observed that the threads on the problem-instance 
seem to never rejoin the group at all, right? Is that just another symptom of 
this bug?
   
   From the soak logs what I observed is that, the three threads from that 
clients stopped making any log entries at different times, roughly 10 mins in 
between, but the pattern are the same: once they received active tasks that are 
all stateless, and then the hb thread reported error right after the 
assignment, the tasks only completes initialization but never completes 
restoration (note that since they are stateless, they should normally transit 
to running right after the next iteration). Without lower-level logs I cannot 
tell for sure, but since the thread-process-rate of those threads indeed drops 
to zero, my suspicion is that the hb error did not set the consumer's to 
re-join, and the poll call maybe blocked on pollForHeartbeat since the state is 
set to `UNJOINED` which cause its poll timeout to MAX_VALUES.
   
   That being said, I cannot comfortable say with 100 percent confidence that 
this is "the" root cause of what we observed in the soaking cluster, but at 
least it is "an" issue that I can discover from the logs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to