On Aug 29, 2013, at 5:50 PM, Sriram Subramanian <srsubraman...@linkedin.com> wrote:
> Do you know why you timed out on a regular shutdown? No, though I think it may just have been that the timeout we put in was too short. > If the replica had > fallen off of the ISR and shutdown was forced on the leader this could > happen. Hmm, but it shouldn't really be made leader if it isn't even in the isr, should it? /Sam > With ack = -1, we guarantee that all the replicas in the in sync > set have received the message before exposing the message to the consumer. > > On 8/29/13 8:32 AM, "Sam Meder" <sam.me...@jivesoftware.com> wrote: > >> We've recently come across a scenario where we see consumers resetting >> their offsets to earliest and which as far as I can tell may also lead to >> data loss (we're running with ack = -1 to avoid loss). This seems to >> happen when we time out on doing a regular shutdown and instead kill -9 >> the kafka broker, but does obviously apply to any scenario that involves >> a unclean exit. As far as I can tell what happens is >> >> 1. On restart the broker truncates the data for the affected partitions, >> i.e. not all data was written to disk. >> 2. The new broker then becomes a leader for the affected partitions and >> consumers get confused because they've already consumed beyond the now >> available offset. >> >> Does that seem like a possible failure scenario? >> >> /Sam >