Re: Segment recovery and replication

Jay Kreps Thu, 29 Aug 2013 08:45:32 -0700

This should not happen. We have a notion of a "committed" message, which is
a message present on all "in sync" nodes. We never hand out a  message to
any consumer until it is committed, and we guarantee that only "in sync"
nodes are electable as leaders. Setting acks=-1 means wait until the
message is committed before returning to the producer.


If you kill all nodes however then all bets are off. In this case we will
elect whichever node shows up first as leader and use its log as the source
of truth. Is it possible this is happening?

-Jay


On Thu, Aug 29, 2013 at 8:32 AM, Sam Meder <sam.me...@jivesoftware.com>wrote:

> We've recently come across a scenario where we see consumers resetting
> their offsets to earliest and which as far as I can tell may also lead to
> data loss (we're running with ack = -1 to avoid loss). This seems to happen
> when we time out on doing a regular shutdown and instead kill -9 the kafka
> broker, but does obviously apply to any scenario that involves a unclean
> exit. As far as I can tell what happens is
>
> 1. On restart the broker truncates the data for the affected partitions,
> i.e. not all data was written to disk.
> 2. The new broker then becomes a leader for the affected partitions and
> consumers get confused because they've already consumed beyond the now
> available offset.
>
> Does that seem like a possible failure scenario?
>
> /Sam

Re: Segment recovery and replication

Reply via email to