We've recently come across a scenario where we see consumers resetting their offsets to earliest and which as far as I can tell may also lead to data loss (we're running with ack = -1 to avoid loss). This seems to happen when we time out on doing a regular shutdown and instead kill -9 the kafka broker, but does obviously apply to any scenario that involves a unclean exit. As far as I can tell what happens is
1. On restart the broker truncates the data for the affected partitions, i.e. not all data was written to disk. 2. The new broker then becomes a leader for the affected partitions and consumers get confused because they've already consumed beyond the now available offset. Does that seem like a possible failure scenario? /Sam