[
https://issues.apache.org/jira/browse/KAFKA-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Aliase updated KAFKA-7457:
---------------------------------
Summary: AbstractCoordinator struck in Discover (was: AbstractCoordinator
struck in discover)
> AbstractCoordinator struck in Discover
> --------------------------------------
>
> Key: KAFKA-7457
> URL: https://issues.apache.org/jira/browse/KAFKA-7457
> Project: Kafka
> Issue Type: Bug
> Components: clients
> Affects Versions: 0.10.1.1
> Environment: Linux
> Reporter: Joseph Aliase
> Priority: Minor
>
> AbstractCoordinator in kafka-client is stuck in discover and never rejoins
> the group. Post restart application is able to join the consumer group and
> consume from the topic.
> We see below logs every 10 minute. The sequence of events are:
> a) NetworkClient complains that connection is idle and closes the connection.
> b) Consumer client tries to determine co-ordinator by sending request to Node
> 2.
> c) Node 2 responds by saying Node 3 is group co-ordinator.
> d) Consumer client connects to group co-ordinator.
> e) There is radio silence for 10 minutes and the sequence gets repeated.
>
> 2018-09-28 16:35:59.771 TRACE org.apache.kafka.common.network.Selector
> [pool-4-thread-50] [active] [wc] About to close the idle connection from
> 2147483644 due to being idle for 540140 millis
> 2018-09-28 16:35:59.771 DEBUG org.apache.kafka.clients.NetworkClient
> [pool-4-thread-50] [active] [wc] Node 2147483644 disconnected.
> 2018-09-28 16:35:59.771 INFO
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Marking the coordinator kafka03-wc.net:9092
> (id: 2147483644 rack: null) dead for group test
> 2018-09-28 16:35:59.771 DEBUG
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Sending coordinator request for group test
> to broker kafka02-wc.net:9092 (id: 2 rack: null)
> 2018-09-28 16:35:59.771 DEBUG org.apache.kafka.clients.NetworkClient
> [pool-4-thread-50] [active] [wc] Sending metadata request \{topics=[address]}
> to node 2
> 2018-09-28 16:35:59.796 DEBUG org.apache.kafka.clients.Metadata
> [pool-4-thread-50] [active] [wc] Updated cluster metadata version 401 to
> Cluster(id = oC0BPXfOT42WqN7-v6b5Gw, nodes = [kafka02-wc.net:9092 (id: 2
> rack: null), kafka03-wc.net:9092 (id: 3 rack: null), kafka05-wc.net:9092 (id:
> 5 rack: null), kafka01-wc.net:9092 (id: 1 rack: null), kafka04-wc.net:9092
> (id: 4 rack: null)], partitions = [Partition(topic = address, partition = 2,
> leader = 1, replicas = [1,5,4,], isr = [5,1,4,]), Partition(topic = address,
> partition = 1, leader = 5, replicas = [5,3,4,], isr = [5,3,4,]),
> Partition(topic = address, partition = 0, leader = 4, replicas = [2,3,4,],
> isr = [4,3,2,]), Partition(topic = address, partition = 6, leader = 5,
> replicas = [1,5,4,], isr = [5,1,4,]), Partition(topic = address, partition =
> 5, leader = 4, replicas = [5,3,4,], isr = [5,4,3,]), Partition(topic =
> address, partition = 4, leader = 3, replicas = [1,2,3,], isr = [1,3,2,]),
> Partition(topic = address, partition = 3, leader = 2, replicas = [1,5,2,],
> isr = [5,1,2,]), Partition(topic = address, partition = 16, leader = 5,
> replicas = [5,2,3,], isr = [5,3,2,]), Partition(topic = address, partition =
> 15, leader = 4, replicas = [1,2,4,], isr = [1,4,2,]), Partition(topic =
> address, partition = 10, leader = 4, replicas = [1,5,4,], isr = [5,1,4,]),
> Partition(topic = address, partition = 9, leader = 3, replicas = [2,3,4,],
> isr = [3,4,2,]), Partition(topic = address, partition = 8, leader = 2,
> replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, partition =
> 7, leader = 1, replicas = [1,5,2,], isr = [5,1,2,]), Partition(topic =
> address, partition = 14, leader = 3, replicas = [5,3,4,], isr = [5,4,3,]),
> Partition(topic = address, partition = 13, leader = 2, replicas = [2,3,4,],
> isr = [3,4,2,]), Partition(topic = address, partition = 12, leader = 1,
> replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, partition =
> 11, leader = 5, replicas = [1,5,2,], isr = [5,1,2,])])
> 2018-09-28 16:35:59.797 DEBUG
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Received group coordinator response
> ClientResponse(receivedTimeMs=1538152559797, disconnected=false,
> request=ClientRequest(expectResponse=true,
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@2531ff5f,
>
> request=RequestSend(header=\{api_key=10,api_version=0,correlation_id=8354,client_id=active-wc-1-256},
> body=\{group_id=test}), createdTimeMs=1538152559771,
> sendTimeMs=1538152559771),
> responseBody=\{error_code=0,coordinator={node_id=3,host=kafka03-wc.net,port=9092}})
> 2018-09-28 16:35:59.797 INFO
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Discovered coordinator kafka03-wc.net:9092
> (id: 2147483644 rack: null) for group test.
> 2018-09-28 16:35:59.797 INFO
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Marking the coordinator kafka03-wc.net:9092
> (id: 2147483644 rack: null) dead for group test
> 2018-09-28 16:36:09.797 DEBUG
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Sending coordinator request for group test
> to broker kafka02-wc.net:9092 (id: 2 rack: null)
> 2018-09-28 16:36:09.822 DEBUG
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Received group coordinator response
> ClientResponse(receivedTimeMs=1538152569822, disconnected=false,
> request=ClientRequest(expectResponse=true,
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@773b5447,
>
> request=RequestSend(header=\{api_key=10,api_version=0,correlation_id=8356,client_id=active-wc-1-256},
> body=\{group_id=test}), createdTimeMs=1538152569797,
> sendTimeMs=1538152569797),
> responseBody=\{error_code=0,coordinator={node_id=3,host=kafka03-wc.net,port=9092}})
> 2018-09-28 16:36:09.822 INFO
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator
> [pool-4-thread-50] [active] [wc] Discovered coordinator kafka03-wc.net:9092
> (id: 2147483644 rack: null) for group test.
> 2018-09-28 16:36:09.822 DEBUG org.apache.kafka.clients.NetworkClient
> [pool-4-thread-50] [active] [wc] Initiating connection to node 2147483644 at
> kafka03-wc.net:9092.
> 2018-09-28 16:36:09.846 DEBUG org.apache.kafka.common.network.Selector
> [pool-4-thread-50] [active] [wc] Created socket with SO_RCVBUF = 65536,
> SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 2147483644
> 2018-09-28 16:36:09.846 DEBUG org.apache.kafka.clients.NetworkClient
> [pool-4-thread-50] [active] [wc] Completed connection to node 2147483644
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)