gharris1727 commented on pull request #9040:
URL: https://github.com/apache/kafka/pull/9040#issuecomment-661296268


   @kkonstantine In my investigation for this fix, I noticed that the long join 
was happening in the first worker to join the group, as it was creating the 
`connect-offsets` topic. The broker logs indicated that the topic was created 
in a timely manner, and that it was visible to the other workers that joined 
afterwards. The first worker remained waiting for the topic creation result 
after the other two workers had been started, causing the test to fail.
   
   I could only pick out one suspicious thing about the create topic operation 
on the broker, as I am not very familiar with broker logs. For the successful 
create topic operations, these unblocked messages appeared:
   ```
   [2020-07-05 09:47:32,423] DEBUG Request key TopicKey(connect-status) 
unblocked 1 topic operations (kafka.server.DelayedOperationPurgatory)
   [2020-07-05 09:47:32,423] DEBUG [Admin Manager on Broker 1]: Request key 
connect-status unblocked 1 topic requests. (kafka.server.AdminManager)
   ```
   These did not appear for the excessively long create topic operation. 
Reading the log message literally, it's possible that the operation is either 
never entering the DelayedOperationPurgatory, or never released from it, and 
thus the operation times out on the client side without the worker finding out 
that the request was filled.
   
   I think this is benign in our case, and a retry will be able to recover the 
test by discovering the topic has already been created.
   
   [Logs for that run with the long 
join](http://confluent-kafka-2-6-system-test-results.s3-us-west-2.amazonaws.com/2020-07-05--001.1593942687--confluentinc--2.6--926929cad/ConnectDistributedTest/test_pause_state_persistent/connect_protocol%3Dcompatible/689.tgz)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to