We've recently come across a scenario where we see consumers resetting their 
offsets to earliest and which as far as I can tell may also lead to data loss 
(we're running with ack = -1 to avoid loss). This seems to happen when we time 
out on doing a regular shutdown and instead kill -9 the kafka broker, but does 
obviously apply to any scenario that involves a unclean exit. As far as I can 
tell what happens is 

1. On restart the broker truncates the data for the affected partitions, i.e. 
not all data was written to disk.
2. The new broker then becomes a leader for the affected partitions and 
consumers get confused because they've already consumed beyond the now 
available offset.

Does that seem like a possible failure scenario?

/Sam

Reply via email to