>> Right, although I was under the impression that committed meant replicated not necessarily synced to disk. That's not the case?
That's correct. >> But what happens when a node goes down, has log truncated to less that in sync and becomes the leader again? You're saying there should be some code that prevents if from becoming the leader until it has caught up? When a broker starts up, it cannot become a leader until it has caught up with the current leader, in other words, has joined the in-sync replica list >> 1. controlled shutdown of brokers 1 & 2 in parallel (kill -9 if they don't shut down) 2. start brokers 1 & 2 3. controlled shutdown of brokers 3 & 4 in parallel (kill -9 if they don't shut down) 4. start brokers 3 & 4 If replication factor is 3, when you shutdown brokers 1 & 2, the leader should shift to broker 3 or 4. Unless you shutdown 3 or 4, before 1 or 2 join in ISR, you shouldn't lose any data. I'm curious to know why controlled shutdown wouldn't succeed ? If you configure the timeout and retries properly and if you are not hitting some sort of a bug, controlled shutdown should succeed. Are you saying that you want to test the kill -9 scenario on purpose ? Thanks, Neha On Thu, Aug 29, 2013 at 9:28 AM, Sam Meder <sam.me...@jivesoftware.com>wrote: > > On Aug 29, 2013, at 5:44 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > This should not happen. We have a notion of a "committed" message, which > is > > a message present on all "in sync" nodes. > > Right, although I was under the impression that committed meant replicated > not necessarily synced to disk. That's not the case? > > > We never hand out a message to > > any consumer until it is committed, and we guarantee that only "in sync" > > nodes are electable as leaders. Setting acks=-1 means wait until the > > message is committed before returning to the producer. > > > > But what happens when a node goes down, has log truncated to less that in > sync and becomes the leader again? You're saying there should be some code > that prevents if from becoming the leader until it has caught up? > > > If you kill all nodes however then all bets are off. In this case we will > > elect whichever node shows up first as leader and use its log as the > source > > of truth. Is it possible this is happening? > > No, this is a bit of a test environment, but we're currently have 4 > brokers, replication set to 3 and we shut them down 2 at a time, so the > sequence usually is > > 1. controlled shutdown of brokers 1 & 2 in parallel (kill -9 if they don't > shut down) > 2. start brokers 1 & 2 > 3. controlled shutdown of brokers 3 & 4 in parallel (kill -9 if they don't > shut down) > 4. start brokers 3 & 4 > > /Sam > > > > > -Jay > > > > > > On Thu, Aug 29, 2013 at 8:32 AM, Sam Meder <sam.me...@jivesoftware.com > >wrote: > > > >> We've recently come across a scenario where we see consumers resetting > >> their offsets to earliest and which as far as I can tell may also lead > to > >> data loss (we're running with ack = -1 to avoid loss). This seems to > happen > >> when we time out on doing a regular shutdown and instead kill -9 the > kafka > >> broker, but does obviously apply to any scenario that involves a unclean > >> exit. As far as I can tell what happens is > >> > >> 1. On restart the broker truncates the data for the affected partitions, > >> i.e. not all data was written to disk. > >> 2. The new broker then becomes a leader for the affected partitions and > >> consumers get confused because they've already consumed beyond the now > >> available offset. > >> > >> Does that seem like a possible failure scenario? > >> > >> /Sam > >