Re: New Java producer broker metadata update stuck

Dan Fri, 08 May 2015 23:38:52 -0700

With the  ip -> broker id mapping: We've got 3 servers with three ip's and
id's 1, 2, 3. The 3 servers kept the ip's but the broker id's switched
around so something like:


192.168.0.1 -> 1
192.168.0.2 -> 2
192.168.0.3 -> 3

Then they all stopped and came back as:

192.168.0.1 -> 2
192.168.0.2 -> 3
192.168.0.3 -> 1

For example. This happened as we are using coreos/fleetd to schedule the
kafka processes across the cluster, each unit has an id which it keeps but
it's not tied to a specific instance.
We're working on putting the id as a property of the server, not the unit
so that wouldn't happen any more for us.

I've not tried the case where the ip -> id mapping doesn't change, given
the above setup that hard for us to test.

Thanks,
Dan


On 8 May 2015 at 18:13, Mayuresh Gharat <gharatmayures...@gmail.com> wrote:

> Also it would be great to know if you see the same issue when you don't
> have different ip -> broker id mapping. Also it would be great if you can
> explain " different ip -> broker id mapping" mean as Becket said.
>
> Thanks,
>
> Mayuresh
>
> On Fri, May 8, 2015 at 9:48 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com>
> wrote:
>
> > It should do a updateMetadataRequest in case it gets NOT_LEADER_FOR
> > PARTITION. This looks like a bug.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Fri, May 8, 2015 at 8:53 AM, Dan <danharve...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> We've noticed an issue on our staging environment where all 3 of our
> Kafka
> >> hosts shutdown and came back with a different ip -> broker id mapping. I
> >> know this is not good and we're fixing that separately. But what we
> >> noticed
> >> is all the consumers recovered but the producers got stuck with the
> >> following exceptions:
> >>
> >> WARN  2015-05-08 09:19:56,347
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544968 on topic-partition
> >> samza-metrics-0, retrying (2145750068 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:56,448
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544970 on topic-partition
> >> samza-metrics-0, retrying (2145750067 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:56,549
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544972 on topic-partition
> >> samza-metrics-0, retrying (2145750066 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:56,649
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544974 on topic-partition
> >> samza-metrics-0, retrying (2145750065 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:56,749
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544976 on topic-partition
> >> samza-metrics-0, retrying (2145750064 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:56,850
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544978 on topic-partition
> >> samza-metrics-0, retrying (2145750063 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:56,949
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544980 on topic-partition
> >> samza-metrics-0, retrying (2145750062 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:57,049
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544982 on topic-partition
> >> samza-metrics-0, retrying (2145750061 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:57,150
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544984 on topic-partition
> >> samza-metrics-0, retrying (2145750060 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:57,254
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544986 on topic-partition
> >> samza-metrics-0, retrying (2145750059 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:57,351
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544988 on topic-partition
> >> samza-metrics-0, retrying (2145750058 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >> WARN  2015-05-08 09:19:57,454
> >> org.apache.kafka.clients.producer.internals.Sender: Got error produce
> >> response with correlation id 3544990 on topic-partition
> >> samza-metrics-0, retrying (2145750057 attempts left). Error:
> >> NOT_LEADER_FOR_PARTITION
> >>
> >>
> >> So it appears as if the producer did not refresh the metadata once the
> >> brokers had come back up. The exceptions carried on for a few hours
> until
> >> we restarted them.
> >>
> >> We noticed this in both 0.8.2.1 Java clients and via, Kakfa-rest
> >> https://github.com/confluentinc/kafka-rest which is using 0.8.2.0-cp.
> >>
> >> Is this a known issue when all brokers go away, or is it a subtle bug
> >> we've
> >> hit?
> >>
> >> Thanks,
> >> Dan
> >>
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: New Java producer broker metadata update stuck

Reply via email to