Ömer Şiar Baysal created KAFKA-16028:
----------------------------------------
Summary: AdminClient fails to describe consumer group
Key: KAFKA-16028
URL: https://issues.apache.org/jira/browse/KAFKA-16028
Project: Kafka
Issue Type: Bug
Components: admin, clients, consumer, log
Affects Versions: 3.6.1, 2.8.2
Reporter: Ömer Şiar Baysal
Dear Team,
We have been investigating some quirky behavior around admin client. Here is
our conclusion:
- Due to some bug (or a feature not known by us) AdminClient (both 2.8 and 3.6)
fails to describe one of the consumer groups (with no known problems about it)
- Pure GoLang admin client does not have the problem (github.com/twmb/franz-go)
and able to describe the consumer group.
We tried to understand what may cause the issue, first of all, the Java client
2.8 reported,
kafka-consumer-groups --bootstrap-server broker:9092 --describe --group
'problematic-consumer'
Error: Executing consumer group command failed due to
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader
for this topic-partition as we are in the middle of a leadership election.
java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader
for this topic-partition as we are in the middle of a leadership election.
we waited if this is a transient error but it turned out it is not, there was
no election for the given topic
But it was not clear which topic admin client was talking about so TRACE log
revealed some more information:
[2023-12-18 10:36:38,434] DEBUG [AdminClient clientId=adminclient-1] Sending
LIST_OFFSETS request with header RequestHeader(apiKey=LIST_OFFSETS,
apiVersion=6, clientId=adminclient-1, correlationId=30) and timeout 4997 to
node 40: ListOffsetsRequestData(replicaId=-1, isolationLevel=0,
topics=[ListOffsetsTopic(name='problematic-topic',
partitions=[ListOffsetsPartition(partitionIndex=4, currentLeaderEpoch=-1,
timestamp=-1, maxNumOffsets=1), ListOffsetsPartition(partitionIndex=5,
currentLeaderEpoch=-1, timestamp=-1, maxNumOffsets=1)])])
(org.apache.kafka.clients.NetworkClient)
[2023-12-18 10:36:38,434] TRACE [AdminClient clientId=adminclient-1] Entering
KafkaClient#poll(timeout=4997) (org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1]
KafkaClient#poll retrieved 0 response(s)
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Trying to
choose nodes for [] at 1702884998435
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,435] TRACE [AdminClient clientId=adminclient-1] Entering
KafkaClient#poll(timeout=4995) (org.apache.kafka.clients.admin.KafkaAdminClient)
Error: Executing consumer group command failed due to
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader
for this topic-partition as we are in the middle of a leadership election.
[2023-12-18 10:36:38,436] DEBUG [AdminClient clientId=adminclient-1] Received
LIST_OFFSETS response from node 40 for request with header
RequestHeader(apiKey=LIST_OFFSETS, apiVersion=6, clientId=adminclient-1,
correlationId=30): ListOffsetsResponseData(throttleTimeMs=0,
topics=[ListOffsetsTopicResponse(name='problematic-topic',
partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0,
oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113,
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA,
followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4,
errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93,
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA,
followerRestorePointEpoch=0)])]) (org.apache.kafka.clients.NetworkClient)
[2023-12-18 10:36:38,436] TRACE [AdminClient clientId=adminclient-1]
KafkaClient#poll retrieved 1 response(s)
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1]
Call(callName=listOffsets on broker 40, deadlineMs=1702885003430, tries=0,
nextAllowedTryMs=0) got response ListOffsetsResponseData(throttleTimeMs=0,
topics=[ListOffsetsTopicResponse(name='problematic-topic',
partitions=[ListOffsetsPartitionResponse(partitionIndex=5, errorCode=0,
oldStyleOffsets=[], timestamp=-1, offset=822516, leaderEpoch=113,
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA,
followerRestorePointEpoch=0), ListOffsetsPartitionResponse(partitionIndex=4,
errorCode=0, oldStyleOffsets=[], timestamp=-1, offset=827297, leaderEpoch=93,
followerRestorePointObjectId=AAAAAAAAAAAAAAAAAAAAAA,
followerRestorePointEpoch=0)])])
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Trying to
choose nodes for [] at 1702884998436
(org.apache.kafka.clients.admin.KafkaAdminClient)
[2023-12-18 10:36:38,437] TRACE [AdminClient clientId=adminclient-1] Entering
KafkaClient#poll(timeout=299161)
(org.apache.kafka.clients.admin.KafkaAdminClient)
java.util.concurrent.ExecutionException:
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader
for this topic-partition as we are in the middle of a leadership election.
adminclient version 3.6 is not returning this error, but it fails with a
timeout after retrying is exhausted.
We have also took a look into "problematic-topic", reassigned replicas to other
brokers, ran kafka-leader-election over all partitions, did not help
--
This message was sent by Atlassian Jira
(v8.20.10#820010)