Do you know why you timed out on a regular shutdown? If the replica had fallen off of the ISR and shutdown was forced on the leader this could happen. With ack = -1, we guarantee that all the replicas in the in sync set have received the message before exposing the message to the consumer.
On 8/29/13 8:32 AM, "Sam Meder" <sam.me...@jivesoftware.com> wrote: >We've recently come across a scenario where we see consumers resetting >their offsets to earliest and which as far as I can tell may also lead to >data loss (we're running with ack = -1 to avoid loss). This seems to >happen when we time out on doing a regular shutdown and instead kill -9 >the kafka broker, but does obviously apply to any scenario that involves >a unclean exit. As far as I can tell what happens is > >1. On restart the broker truncates the data for the affected partitions, >i.e. not all data was written to disk. >2. The new broker then becomes a leader for the affected partitions and >consumers get confused because they've already consumed beyond the now >available offset. > >Does that seem like a possible failure scenario? > >/Sam