Hi Michal, Thanks for the perfect links. They really help. Now it looks like with request.required.acks=1 (let alone 0) messages can be lost in the case I described. The aphyr's article, seemingly, describes a more tricky case than I have.
I'm still not sure on Kafka behavior in case of request.required.acks=-1. With this setting in effect, the scenario will turn into this: 1. Producer sends message A to the leader. 2. Leader stores the message, followers fetch it. Everyone's in sync. 3. Producer sends message B to the leader. Followers haven't still fetched the message and lag by 1 message: the B is still only on the broker 1. The produce request for message B sits in the purgatory waiting for acknowledgement from the followers. 4. The leader goes down. 5. Followers cannot fetch B anymore, since its only owner is down. Yet some of the replicas needs to take over the leader responsibility. Say, broker 2 now becomes the leader, 3 is the follower. 6. Producer sends message C to the leader (broker 2). Follower fetches it. 7. Broker 1 goes back online and starts following broker 2. What happens with message B that was sitting in the 1's purgatory when it went down? Will an error "unable to send message B" be returned to the producer immediately after broker 1 shutdown? 2014-06-30 17:41 GMT+04:00 Michal Michalski <michal.michal...@boxever.com>: > Hi Yury, > > If I understand correctly, the case you're describing is equivalent to the > leader re-election (in terms of data consistency). In that case messages > can be lost depending on your "acks" setting: > > https://kafka.apache.org/documentation.html > see: request.required.acks: > E.g. "only messages that were written to the now-dead leader but not yet > replicated will be lost)." for acks=1 > > More info on that: > http://aphyr.com/posts/293-call-me-maybe-kafka > > However, I'd be happy if someone with more Kafka experience confirmed my > understanding of that issue. > > > Kind regards, > MichaĆ Michalski, > michal.michal...@boxever.com > > > On 30 June 2014 14:34, Yury Ruchin <yuri.ruc...@gmail.com> wrote: > > > Thanks! > > > > I'm also trying to understand how replicas will catch up once the leader > > goes down. Say, we have 3 brokers with IDs 1, 2, 3. The leader is broker > 1. > > Followers are 2 and 3. Consider the following scenario assuming that all > > messages fall into the same partition: > > > > 1. Producer sends message A to the leader. > > 2. Leader stores the message, followers fetch it. Everyone's in sync. > > 3. Producer sends message B to the leader. Followers haven't still > fetched > > the message and lag by 1 message: the B is still only on the broker 1. > > 4. I bring the leader down. > > 5. Followers cannot fetch B anymore, since its only owner is down. Yet > some > > of the replicas needs to take over the leader responsibility. Say, > broker 2 > > now becomes the leader, 3 is the follower. > > 6. Producer sends message C to the leader (broker 2). Follower fetches > it. > > > > I don't quite understand the state of the log on replicas 2 and 3 after > > step#6. It looks like the log will have a gap in it. The expected log > state > > is ["A", "B", "C"]. But brokers 2 and 3 didn't have a chance to fetch > "B", > > so their log looks like ["A", "C"]. Will Kafka try to fill the gap in the > > background once broker 1 started over? > > > > > > 2014-06-18 19:59 GMT+04:00 Neha Narkhede <neha.narkh...@gmail.com>: > > > > > You don't gain much by running #4 between broker bounces. Running it > > after > > > the cluster is upgraded will be sufficient. > > > > > > Thanks, > > > Neha > > > > > > > > > On Wed, Jun 18, 2014 at 8:33 AM, Yury Ruchin <yuri.ruc...@gmail.com> > > > wrote: > > > > > > > Hi folks, > > > > > > > > In my project, we want to perform to update our active Kafka 0.8 > > cluster > > > to > > > > Kafka 0.8.1.1 without downtime and losing any data. The process > (after > > > > reading http://kafka.apache.org/documentation.html#upgrade) looks to > > me > > > > like this. For each broker in turn: > > > > > > > > 1. Bring the broker down. > > > > 2. Update Kafka to 0.8.1.1 on the broker node. > > > > 3. Start the broker. > > > > 4. Run preferred-replica-election script to restore broker's > leadership > > > for > > > > respective partitions. > > > > 5. Wait for the the preferred replica election to complete. > > > > > > > > I deem step#5 necessary since preferred replica election is an > > > asynchronous > > > > process. There is a slim chance that bringing other brokers down > before > > > the > > > > election is complete would result in all replicas down for some > > > partitions, > > > > so a portion of the incoming data stream would be lost. Is my > > > understanding > > > > of the process correct? > > > > > > > > > >