Hanish, One thing you can check is when you kill one of the brokers, is the other broker on the ISR last of the partition that killed broker is hosting. This can be done using the kafka-topics tool.
Also you can check if the controller log if there is any entry like "No broker in ISR is alive for %s. Elect leader %d from live brokers %s. There's potential data loss." Guozhang On Fri, Dec 20, 2013 at 9:11 AM, Jun Rao <jun...@gmail.com> wrote: > Could you reproduce this easily? If so, could you file a jira and describe > the steps? > > Thanks, > > Jun > > > On Thu, Dec 19, 2013 at 9:41 PM, Hanish Bansal < > hanish.bansal.agar...@gmail.com> wrote: > > > Hi Guozhang, > > > > I have tried with Kafka-0.8.1 after applying patch 1188 but thats not > > helping in this case. > > > > Also controlled.shutdown.enable is also not helpful in case of abnormally > > shutdown (i.e. SIGKILL (-9)). > > > > Any other suggestion? > > > > > > On Thu, Dec 19, 2013 at 3:59 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Yes, please go ahead. > > > > > > > > > On Thu, Dec 19, 2013 at 2:30 AM, Hanish Bansal < > > > hanish.bansal.agar...@gmail.com> wrote: > > > > > > > Hi Guazhang, > > > > > > > > Can I try it now with trunk HEAD kafka code after applying the patch > > > > KAFKA-1188.patch< > > > > > > > > > > https://issues.apache.org/jira/secure/attachment/12619475/KAFKA-1188.patch > > > > >? > > > > > > > > > > > > > > > > > > > > On Wed, Dec 18, 2013 at 9:49 PM, Guozhang Wang <wangg...@gmail.com> > > > wrote: > > > > > > > > > Kafka server's shutdown hook should capture all SIG but NOT SIGKILL > > > (-9), > > > > > and the controlled shut down process will not be triggered in this > > > case. > > > > > > > > > > That said, if the other replica is in ISR then even kill -9 should > > not > > > > > lose data. I am currently working on this JIRA that might be > related > > if > > > > > brokers are bounced iteratively: > > > > > > > > > > https://issues.apache.org/jira/browse/KAFKA-1188 > > > > > > > > > > Hanish, could you retry trunk HEAD once this one is resolved? > > > > > > > > > > > > > > > On Wed, Dec 18, 2013 at 12:00 PM, Joe Stein <joe.st...@stealth.ly> > > > > wrote: > > > > > > > > > > > leader election should start for the brokers that are in the isr > > for > > > > the > > > > > > partitions that are on that replica that are leaders by the other > > > > > replicas > > > > > > still in the isr, and the leader failed removed from the isr. > The > > > isr > > > > > will > > > > > > shrink for all other partitions this broker is in the isr on but > > not > > > > the > > > > > > leader. > > > > > > > > > > > > so lots of re-giggling and the time there is going to be related > to > > > how > > > > > > many partitions and brokers you have. > > > > > > > > > > > > On Wed, Dec 18, 2013 at 2:49 PM, Robert Rodgers < > > rsrodg...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > what happens if the physical machine dies or the kernel panics? > > > > > > > > > > > > > > On Dec 18, 2013, at 9:44 AM, Hanish Bansal < > > > > > > > hanish.bansal.agar...@gmail.com> wrote: > > > > > > > > > > > > > > > Yup definitely i would like to try that If > > > > controlled.shutdown.enable > > > > > > > > property works in case of kill -9. > > > > > > > > > > > > > > > > I hope that this option will be perfect. > > > > > > > > > > > > > > > > Thanks for quick response, really appreciate it. > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 18, 2013 at 10:52 PM, Joe Stein < > > > joe.st...@stealth.ly> > > > > > > > wrote: > > > > > > > > > > > > > > > >> Wouldn't you want to set the controlled.shutdown.enable to > > true > > > so > > > > > the > > > > > > > >> broker would do this for you before ending itself? > > > > > > > >> > > > > > > > >> /******************************************* > > > > > > > >> Joe Stein > > > > > > > >> Founder, Principal Consultant > > > > > > > >> Big Data Open Source Security LLC > > > > > > > >> http://www.stealth.ly > > > > > > > >> Twitter: @allthingshadoop < > > > http://www.twitter.com/allthingshadoop > > > > > > > > > > > > >> ********************************************/ > > > > > > > >> > > > > > > > >> > > > > > > > >> On Wed, Dec 18, 2013 at 11:36 AM, pushkar priyadarshi < > > > > > > > >> priyadarshi.push...@gmail.com> wrote: > > > > > > > >> > > > > > > > >>> my doubt was they are dropping off at producer level > only.so > > > > > > suggested > > > > > > > >>> playing with paramaters like retries and backoff.ms and > also > > > > with > > > > > > > >>> refreshinterval on producer side. > > > > > > > >>> > > > > > > > >>> Regards, > > > > > > > >>> Pushkar > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> On Wed, Dec 18, 2013 at 10:01 PM, Guozhang Wang < > > > > > wangg...@gmail.com> > > > > > > > >>> wrote: > > > > > > > >>> > > > > > > > >>>> Hanish, > > > > > > > >>>> > > > > > > > >>>> Did you "kill -9" one of the brokers only or bouncing them > > > > > > > iteratively? > > > > > > > >>>> > > > > > > > >>>> Guozhang > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> On Wed, Dec 18, 2013 at 8:02 AM, Joe Stein < > > > > joe.st...@stealth.ly> > > > > > > > >> wrote: > > > > > > > >>>> > > > > > > > >>>>> How many replicas do you have? > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > > >>>>> On Wed, Dec 18, 2013 at 8:57 AM, Hanish Bansal < > > > > > > > >>>>> hanish.bansal.agar...@gmail.com> wrote: > > > > > > > >>>>> > > > > > > > >>>>>> Hi pushkar, > > > > > > > >>>>>> > > > > > > > >>>>>> I tried with configuring "message.send.max.retries" to > > 10. > > > > > > Default > > > > > > > >>>> value > > > > > > > >>>>>> for this is 3. > > > > > > > >>>>>> > > > > > > > >>>>>> But still facing data loss. > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> On Wed, Dec 18, 2013 at 12:44 PM, pushkar priyadarshi < > > > > > > > >>>>>> priyadarshi.push...@gmail.com> wrote: > > > > > > > >>>>>> > > > > > > > >>>>>>> You can try setting a higher value for > > > > > "message.send.max.retries" > > > > > > > >>> in > > > > > > > >>>>>>> producer config. > > > > > > > >>>>>>> > > > > > > > >>>>>>> Regards, > > > > > > > >>>>>>> Pushkar > > > > > > > >>>>>>> > > > > > > > >>>>>>> > > > > > > > >>>>>>> On Wed, Dec 18, 2013 at 5:34 PM, Hanish Bansal < > > > > > > > >>>>>>> hanish.bansal.agar...@gmail.com> wrote: > > > > > > > >>>>>>> > > > > > > > >>>>>>>> Hi All, > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> We are having kafka cluster of 2 nodes. (using 0.8.0 > > final > > > > > > > >>> release) > > > > > > > >>>>>>>> Replication Factor: 2 > > > > > > > >>>>>>>> Number of partitions: 2 > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> I have configured request.required.acks in producer > > > > > > > >> configuration > > > > > > > >>>> to > > > > > > > >>>>>> -1. > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> As mentioned in documentation > > > > > > > >>>>>>>> > > > http://kafka.apache.org/documentation.html#producerconfigs, > > > > > > > >>>> setting > > > > > > > >>>>>> this > > > > > > > >>>>>>>> value to -1 provides guarantee that no messages will > be > > > > lost. > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> I am getting below behaviour: > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> If kafka is running as foreground process and i am > > > shutting > > > > > > > >> down > > > > > > > >>>> the > > > > > > > >>>>>>> kafka > > > > > > > >>>>>>>> leader node using "ctrl+C" then no data is lost. > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> But if i abnormally terminate the kafka using "kill -9 > > > > <pid>" > > > > > > > >>> then > > > > > > > >>>>>> still > > > > > > > >>>>>>>> facing data loss even after configuring > > > > request.required.acks > > > > > > > >> to > > > > > > > >>>> -1. > > > > > > > >>>>>>>> > > > > > > > >>>>>>>> Any suggestions? > > > > > > > >>>>>>>> -- > > > > > > > >>>>>>>> *Thanks & Regards* > > > > > > > >>>>>>>> *Hanish Bansal* > > > > > > > >>>>>>>> > > > > > > > >>>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> -- > > > > > > > >>>>>> *Thanks & Regards* > > > > > > > >>>>>> *Hanish Bansal* > > > > > > > >>>>>> > > > > > > > >>>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> -- > > > > > > > >>>> -- Guozhang > > > > > > > >>>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > *Thanks & Regards* > > > > > > > > *Hanish Bansal* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > -- > > > > *Thanks & Regards* > > > > *Hanish Bansal* > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > > > -- > > *Thanks & Regards* > > *Hanish Bansal* > > > -- -- Guozhang