Do you know why you timed out on a regular shutdown? If the replica had
fallen off of the ISR and shutdown was forced on the leader this could
happen. With ack = -1, we guarantee that all the replicas in the in sync
set have received the message before exposing the message to the consumer.

On 8/29/13 8:32 AM, "Sam Meder" <sam.me...@jivesoftware.com> wrote:

>We've recently come across a scenario where we see consumers resetting
>their offsets to earliest and which as far as I can tell may also lead to
>data loss (we're running with ack = -1 to avoid loss). This seems to
>happen when we time out on doing a regular shutdown and instead kill -9
>the kafka broker, but does obviously apply to any scenario that involves
>a unclean exit. As far as I can tell what happens is
>
>1. On restart the broker truncates the data for the affected partitions,
>i.e. not all data was written to disk.
>2. The new broker then becomes a leader for the affected partitions and
>consumers get confused because they've already consumed beyond the now
>available offset.
>
>Does that seem like a possible failure scenario?
>
>/Sam

Reply via email to