GeoffreyStark created KAFKA-12665:

             Summary: one of brokers which is also controller has too much 
                 Key: KAFKA-12665
             Project: Kafka
          Issue Type: Bug
          Components: clients, consumer, controller, core
    Affects Versions:
            Reporter: GeoffreyStark
         Attachments: image-2021-04-14-10-32-54-140.png, 
image-2021-04-14-10-39-02-996.png, image-2021-04-14-11-26-03-346.png

# *enviroment*


5 nodes

3 replicator

mean message per sec : 4k

Prometheus & jmxProt & grafana

cosumer : spring boot& Doris routineLoad

producer: spring boo& Log 


# *encounter with*

 we encounter with a broker (id : 4)which is also controller (epoch 90)having 
much CLOSE_WAITE  at a time 


Controller 4 epoch 90 fails to send request (type: UpdateMetadataRequest ... Connection to 4 was disconnected before the response was 


It will be retried many, many times, but the WARNING will not change


At the same time

another broker 6  fetching message from the broker 4 also encounter with the 
[2021-04-13 16:35:06,942] WARN [ReplicaFetcherThread-0-4]: Error in fetch to 
broker 4, request (type=FetchRequest, replicaId=6, maxWait=500, minBytes=1, 
maxBytes=10485760, Connection to 4 was disconnected before the response was 


doris routineLoad(consume from kafka) time out

2021-04-13 16:35:11,397 WARN (Routine load scheduler|42) 
[KafkaUtil.getAllKafkaPartitions():91] failed to get partitions. 
org.apache.doris.common.UserException: errCode = 2, detailMessage = failed to 
get kafka partition info: [failed to get partition meta: Local: Timed out]


broker 4( controller 90) fs.file


Most of the CLOSE_WAITE is generated by the consumer application

At 16:49, the broker was restarted and returned to normal



*# speculation*

The TCP connection is closed passively, but the processing of the Controller 
Broker machine is not responding

Are there any bugs in this version?




This message was sent by Atlassian Jira

Reply via email to