With the ip -> broker id mapping: We've got 3 servers with three ip's and id's 1, 2, 3. The 3 servers kept the ip's but the broker id's switched around so something like:
192.168.0.1 -> 1 192.168.0.2 -> 2 192.168.0.3 -> 3 Then they all stopped and came back as: 192.168.0.1 -> 2 192.168.0.2 -> 3 192.168.0.3 -> 1 For example. This happened as we are using coreos/fleetd to schedule the kafka processes across the cluster, each unit has an id which it keeps but it's not tied to a specific instance. We're working on putting the id as a property of the server, not the unit so that wouldn't happen any more for us. I've not tried the case where the ip -> id mapping doesn't change, given the above setup that hard for us to test. Thanks, Dan On 8 May 2015 at 18:13, Mayuresh Gharat <gharatmayures...@gmail.com> wrote: > Also it would be great to know if you see the same issue when you don't > have different ip -> broker id mapping. Also it would be great if you can > explain " different ip -> broker id mapping" mean as Becket said. > > Thanks, > > Mayuresh > > On Fri, May 8, 2015 at 9:48 AM, Mayuresh Gharat < > gharatmayures...@gmail.com> > wrote: > > > It should do a updateMetadataRequest in case it gets NOT_LEADER_FOR > > PARTITION. This looks like a bug. > > > > Thanks, > > > > Mayuresh > > > > On Fri, May 8, 2015 at 8:53 AM, Dan <danharve...@gmail.com> wrote: > > > >> Hi, > >> > >> We've noticed an issue on our staging environment where all 3 of our > Kafka > >> hosts shutdown and came back with a different ip -> broker id mapping. I > >> know this is not good and we're fixing that separately. But what we > >> noticed > >> is all the consumers recovered but the producers got stuck with the > >> following exceptions: > >> > >> WARN 2015-05-08 09:19:56,347 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544968 on topic-partition > >> samza-metrics-0, retrying (2145750068 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:56,448 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544970 on topic-partition > >> samza-metrics-0, retrying (2145750067 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:56,549 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544972 on topic-partition > >> samza-metrics-0, retrying (2145750066 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:56,649 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544974 on topic-partition > >> samza-metrics-0, retrying (2145750065 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:56,749 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544976 on topic-partition > >> samza-metrics-0, retrying (2145750064 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:56,850 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544978 on topic-partition > >> samza-metrics-0, retrying (2145750063 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:56,949 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544980 on topic-partition > >> samza-metrics-0, retrying (2145750062 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:57,049 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544982 on topic-partition > >> samza-metrics-0, retrying (2145750061 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:57,150 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544984 on topic-partition > >> samza-metrics-0, retrying (2145750060 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:57,254 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544986 on topic-partition > >> samza-metrics-0, retrying (2145750059 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:57,351 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544988 on topic-partition > >> samza-metrics-0, retrying (2145750058 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> WARN 2015-05-08 09:19:57,454 > >> org.apache.kafka.clients.producer.internals.Sender: Got error produce > >> response with correlation id 3544990 on topic-partition > >> samza-metrics-0, retrying (2145750057 attempts left). Error: > >> NOT_LEADER_FOR_PARTITION > >> > >> > >> So it appears as if the producer did not refresh the metadata once the > >> brokers had come back up. The exceptions carried on for a few hours > until > >> we restarted them. > >> > >> We noticed this in both 0.8.2.1 Java clients and via, Kakfa-rest > >> https://github.com/confluentinc/kafka-rest which is using 0.8.2.0-cp. > >> > >> Is this a known issue when all brokers go away, or is it a subtle bug > >> we've > >> hit? > >> > >> Thanks, > >> Dan > >> > > > > > > > > -- > > -Regards, > > Mayuresh R. Gharat > > (862) 250-7125 > > > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 >