Re: Segment recovery and replication

Sam Meder Thu, 29 Aug 2013 09:42:06 -0700

On Aug 29, 2013, at 5:50 PM, Sriram Subramanian <srsubraman...@linkedin.com> 
wrote:


> Do you know why you timed out on a regular shutdown?

No, though I think it may just have been that the timeout we put in was too 
short.

> If the replica had
> fallen off of the ISR and shutdown was forced on the leader this could
> happen.

Hmm, but it shouldn't really be made leader if it isn't even in the isr, should 
it?

/Sam

> With ack = -1, we guarantee that all the replicas in the in sync
> set have received the message before exposing the message to the consumer.
> 
> On 8/29/13 8:32 AM, "Sam Meder" <sam.me...@jivesoftware.com> wrote:
> 
>> We've recently come across a scenario where we see consumers resetting
>> their offsets to earliest and which as far as I can tell may also lead to
>> data loss (we're running with ack = -1 to avoid loss). This seems to
>> happen when we time out on doing a regular shutdown and instead kill -9
>> the kafka broker, but does obviously apply to any scenario that involves
>> a unclean exit. As far as I can tell what happens is
>> 
>> 1. On restart the broker truncates the data for the affected partitions,
>> i.e. not all data was written to disk.
>> 2. The new broker then becomes a leader for the affected partitions and
>> consumers get confused because they've already consumed beyond the now
>> available offset.
>> 
>> Does that seem like a possible failure scenario?
>> 
>> /Sam
>

Re: Segment recovery and replication

Reply via email to