Mike Mintz created KAFKA-8477:
---------------------------------

             Summary: Cannot consume and request metadata for 
__consumer_offsets topic in Kafka 2.2
                 Key: KAFKA-8477
                 URL: https://issues.apache.org/jira/browse/KAFKA-8477
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 2.2.0
            Reporter: Mike Mintz
         Attachments: kafka-2.2.0-consumer-offset-metadata-bug-master.zip, 
logs.txt

We have an application that consumes from the __consumer_offsets topic to 
report lag metrics. When we upgraded its kafka-clients dependency from 2.0.1 to 
2.2.0, it crashed with:
{noformat}
Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
Failed to get offsets by times in 30001ms
{noformat}
I created a minimal reproduction at 
[https://github.com/mikemintz/kafka-2.2.0-consumer-offset-metadata-bug] and I'm 
uploading a zip of this code for posterity.

In particular, the behavior happens when I call KafkaConsumer.assign(), then 
poll(), then endOffsets(). This behavior only happens for the 
__consumer_offsets topic. It also only happens on the Kafka cluster that we run 
in production, which runs Kafka 2.2.0. The error does not occur on a freshly 
created Kafka cluster, and I can't get it to reproduce with EmbeddedKafka.

It works fine with both Kafka 2.0.1 and Kafka 2.1.1, so something broke between 
2.1.1. and 2.2.0. Based on the 2.2.0 changelog and the client log messages 
(attached), it looks like it may have been introduced in KAFKA-7738 (cc 
[~mumrah]). It gets in a loop, repeating the following block of log messages:
{noformat}
2019-06-03 23:24:15 DEBUG NetworkClient:1073 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Sending metadata request 
(type=MetadataRequest, topics=__consumer_offsets) to node REDACTED:9094 (id: 
2134 rack: us-west-2b)
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5862 to 
5862 for partition __consumer_offsets-0
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6040 to 
6040 for partition __consumer_offsets-10
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6008 to 
6008 for partition __consumer_offsets-20
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6153 with 
new epoch 6152
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5652 to 
5652 for partition __consumer_offsets-30
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6081 to 
6081 for partition __consumer_offsets-39
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5629 to 
5629 for partition __consumer_offsets-9
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5983 to 
5983 for partition __consumer_offsets-11
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 to 
5896 for partition __consumer_offsets-31
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5278 to 
5278 for partition __consumer_offsets-13
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6026 to 
6026 for partition __consumer_offsets-18
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5608 to 
5608 for partition __consumer_offsets-22
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6025 to 
6025 for partition __consumer_offsets-32
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5685 to 
5685 for partition __consumer_offsets-8
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5730 to 
5730 for partition __consumer_offsets-43
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5957 with 
new epoch 5956
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 6047 with 
new epoch 6046
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6090 to 
6090 for partition __consumer_offsets-1
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5821 to 
5821 for partition __consumer_offsets-6
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5909 to 
5909 for partition __consumer_offsets-41
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5805 to 
5805 for partition __consumer_offsets-27
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 to 
5670 for partition __consumer_offsets-48
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6220 to 
6220 for partition __consumer_offsets-5
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5596 to 
5596 for partition __consumer_offsets-15
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5896 to 
5896 for partition __consumer_offsets-35
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5858 to 
5858 for partition __consumer_offsets-25
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5927 with 
new epoch 5926
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5938 with 
new epoch 5937
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5792 to 
5792 for partition __consumer_offsets-36
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5984 to 
5984 for partition __consumer_offsets-44
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5932 with 
new epoch 5931
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5979 to 
5979 for partition __consumer_offsets-16
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5804 to 
5804 for partition __consumer_offsets-17
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6179 to 
6179 for partition __consumer_offsets-3
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5953 to 
5953 for partition __consumer_offsets-45
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6276 to 
6276 for partition __consumer_offsets-24
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6119 to 
6119 for partition __consumer_offsets-38
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5784 to 
5784 for partition __consumer_offsets-33
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5665 to 
5665 for partition __consumer_offsets-23
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5812 to 
5812 for partition __consumer_offsets-28
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6014 to 
6014 for partition __consumer_offsets-2
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 6199 to 
6199 for partition __consumer_offsets-12
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5908 to 
5908 for partition __consumer_offsets-19
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5697 to 
5697 for partition __consumer_offsets-14
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5673 to 
5673 for partition __consumer_offsets-4
2019-06-03 23:24:15 DEBUG Metadata:208 - Not replacing existing epoch 5921 with 
new epoch 5920
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5940 to 
5940 for partition __consumer_offsets-49
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5836 to 
5836 for partition __consumer_offsets-42
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5827 to 
5827 for partition __consumer_offsets-7
2019-06-03 23:24:15 DEBUG Metadata:201 - Updating last seen epoch from 5670 to 
5670 for partition __consumer_offsets-21
2019-06-03 23:24:15 DEBUG Metadata:370 - Updated cluster metadata version 293 
to MetadataCache{cluster=Cluster(id = 5Cs2i3WbTzKHbrF2GzLwaQ, nodes = 
[REDACTED:9094 (id: 2132 rack: us-west-2a), REDACTED:9094 (id: 2133 rack: 
us-west-2a), REDACTED:9094 (id: 2131 rack: us-west-2c), REDACTED:9094 (id: 2136 
rack: us-west-2b), REDACTED:9094 (id: 2135 rack: us-west-2c), REDACTED:9094 
(id: 2134 rack: us-west-2b), REDACTED:9094 (id: 2137 rack: us-west-2c)], 
partitions = [Partition(topic = __consumer_offsets, partition = 13, leader = 
2136, replicas = [2136,2131,2132], isr = [2132,2136,2131], offlineReplicas = 
[]), Partition(topic = __consumer_offsets, partition = 46, leader = 2134, 
replicas = [2134,2133,2135], isr = [2135,2134,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 9, leader = 2132, replicas = 
[2132,2134,2131], isr = [2132,2134,2131], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 42, leader = 2133, replicas = 
[2133,2136,2135], isr = [2135,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 21, leader = 2136, replicas = 
[2136,2131,2133], isr = [2131,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 17, leader = 2133, replicas = 
[2133,2136,2131], isr = [2131,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 30, leader = 2136, replicas = 
[2136,2133,2131], isr = [2136,2133,2131], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 26, leader = 2132, replicas = 
[2132,2134,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 5, leader = 2136, replicas = 
[2136,2131,2132], isr = [2132,2131,2136], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 38, leader = 2134, replicas = 
[2134,2133,2135], isr = [2135,2134,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 1, leader = 2132, replicas = 
[2132,2134,2131], isr = [2132,2134,2131], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 34, leader = 2132, replicas = 
[2132,2136,2135], isr = [2135,2132,2136], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 16, leader = 2134, replicas = 
[2134,2132,2135], isr = [2132,2134,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 45, leader = 2133, replicas = 
[2133,2136,2135], isr = [2135,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 12, leader = 2133, replicas = 
[2133,2134,2131], isr = [2134,2131,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 41, leader = 2136, replicas = 
[2136,2133,2135], isr = [2135,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 24, leader = 2136, replicas = 
[2136,2133,2131], isr = [2131,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 20, leader = 2132, replicas = 
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 49, leader = 2136, replicas = 
[2136,2132,2135], isr = [2135,2132,2136], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 0, leader = 2134, replicas = 
[2134,2132,2135], isr = [2132,2134,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 29, leader = 2134, replicas = 
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 25, leader = 2134, replicas = 
[2134,2135,2133], isr = [2134,2133,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 8, leader = 2133, replicas = 
[2133,2136,2131], isr = [2131,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 37, leader = 2133, replicas = 
[2133,2136,2135], isr = [2135,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 4, leader = 2133, replicas = 
[2133,2134,2131], isr = [2134,2131,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 33, leader = 2136, replicas = 
[2136,2133,2135], isr = [2135,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 15, leader = 2132, replicas = 
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 48, leader = 2132, replicas = 
[2132,2134,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 11, leader = 2132, replicas = 
[2132,2135,2136], isr = [2132,2136,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 44, leader = 2133, replicas = 
[2133,2134,2135], isr = [2135,2134,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 23, leader = 2132, replicas = 
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 19, leader = 2136, replicas = 
[2136,2133,2131], isr = [2131,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 32, leader = 2136, replicas = 
[2136,2131,2133], isr = [2131,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 28, leader = 2136, replicas = 
[2136,2133,2135], isr = [2135,2136,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 7, leader = 2136, replicas = 
[2136,2132,2131], isr = [2132,2136,2131], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 40, leader = 2134, replicas = 
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 3, leader = 2136, replicas = 
[2136,2133,2135], isr = [2136,2133,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 36, leader = 2134, replicas = 
[2134,2133,2135], isr = [2135,2134,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 47, leader = 2132, replicas = 
[2132,2134,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 14, leader = 2134, replicas = 
[2134,2135,2132], isr = [2132,2134,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 43, leader = 2134, replicas = 
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 10, leader = 2132, replicas = 
[2132,2136,2135], isr = [2132,2136,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 22, leader = 2134, replicas = 
[2134,2135,2133], isr = [2134,2133,2135], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 18, leader = 2136, replicas = 
[2136,2131,2132], isr = [2132,2131,2136], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 31, leader = 2133, replicas = 
[2133,2134,2135], isr = [2135,2134,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 27, leader = 2134, replicas = 
[2134,2133,2131], isr = [2134,2131,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 39, leader = 2134, replicas = 
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 6, leader = 2133, replicas = 
[2133,2134,2131], isr = [2134,2131,2133], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 35, leader = 2134, replicas = 
[2134,2132,2135], isr = [2135,2132,2134], offlineReplicas = []), 
Partition(topic = __consumer_offsets, partition = 2, leader = 2134, replicas = 
[2134,2135,2132], isr = [2132,2134,2135], offlineReplicas = [])], controller = 
REDACTED:9094 (id: 2132 rack: us-west-2a))}
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-46 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-26 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-34 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-29 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-37 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-40 is unknown for fetching offset
2019-06-03 23:24:15 DEBUG Fetcher:700 - [Consumer 
clientId=test.mikemintz.lag-tracker-reproduce, 
groupId=test.mikemintz.lag-tracker-reproduce] Leader for partition 
__consumer_offsets-47 is unknown for fetching offset
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to