Re: some producers stuck when one broker is bad

2015-09-11 Thread Steven Wu
I was doing a rolling bounce of all brokers. Immediately after the bad broker was bounced, those stuck producers recovered On Fri, Sep 11, 2015 at 9:05 AM, Mayuresh Gharat wrote: > So how did you detect that the broker is bad? If bouncing brokers solved > the problem and you did not find any unu

Re: some producers stuck when one broker is bad

2015-09-11 Thread Mayuresh Gharat
So how did you detect that the broker is bad? If bouncing brokers solved the problem and you did not find any unusual things in the logs on brokers , it is likely that the process was up but was isolated from producer request and since the producer did not have timeout the producer buffer filled up

Re: some producers stuck when one broker is bad

2015-09-10 Thread Steven Wu
frankly I don't know exactly what went BAD for that broker. process is still UP. On Wed, Sep 9, 2015 at 10:10 AM, Mayuresh Gharat wrote: > 1) any suggestion on how to identify the bad broker(s)? > ---> At Linkedin we have alerts that are setup using our internal scripts > for detecting if a brok

Re: some producers stuck when one broker is bad

2015-09-09 Thread Mayuresh Gharat
1) any suggestion on how to identify the bad broker(s)? ---> At Linkedin we have alerts that are setup using our internal scripts for detecting if a broker has gone bad. We also check the under replicated partitions and that can tell us which broker has gone bad. By broker going bad, it can mean di

some producers stuck when one broker is bad

2015-09-08 Thread Steven Wu
We have observed that some producer instances stopped sending traffic to brokers, because the memory buffer is full. those producers got stuck in this state permanently. Because we couldn't find out which broker is bad here. So I did a rolling restart the all brokers. after the bad broker got bounc