Re: some producers stuck when one broker is bad

Mayuresh Gharat Wed, 09 Sep 2015 10:18:08 -0700

1) any suggestion on how to identify the bad broker(s)?
---> At Linkedin we have alerts that are setup using our internal scripts
for detecting if a broker has gone bad. We also check the under replicated
partitions and that can tell us which broker has gone bad. By broker going
bad, it can mean different things. Like the broker is alive but not
responding and is completely isolated or the broker has gone down, etc.
Can you tell us what you meant by your BROKER went BAD?


2) why bouncing of the bad broker got the producers recovered automatically
----> This is because as you bounced, the leaders for other partitions
changed and producer sent out a TopicMetadataRequest which tells the
producer who are the new leaders for the partitions and the producer
started sending messages to those brokers.

KAFKA-2120 will handle all of this for you automatically.

Thanks,

Mayuresh

On Tue, Sep 8, 2015 at 8:26 PM, Steven Wu <[email protected]> wrote:

> We have observed that some producer instances stopped sending traffic to
> brokers, because the memory buffer is full. those producers got stuck in
> this state permanently. Because we couldn't find out which broker is bad
> here. So I did a rolling restart the all brokers. after the bad broker got
> bounce, those stuck producers out of the woods automatically.
>
> I don't know the exact problem with that bad broker. it seems to me that
> some ZK states are inconsistent.
>
> I know timeout fix from KAFKA-2120 can probably avoid the permanent stuck.
> Here are some additional questions.
> 1) any suggestion on how to identify the bad broker(s)?
> 2) why bouncing of the bad broker got the producers recovered automatically
> (without restarting producers)
>
> producer: 0.8.2.1
> broker: 0.8.2.1
>
> Thanks,
> Steven
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: some producers stuck when one broker is bad

Reply via email to