My understanding is that "bringing down 1 node our of a 3 node zookeeper cluster is risky, since any subsequent leader election *might* not reach a quorum"and "It is less likely but still risky to some extent" - *"it might not reach a quorum"*, because you need both of the remaining nodes to be up to reach quorum (of course it will be still possible, but it *might* fail). In case of a 5-node cluster having 1 node down is not that risky, because you still have 4 nodes and you need only 3 of them to reach quorum.
M. Kind regards, MichaĆ Michalski, michal.michal...@boxever.com On 25 June 2014 09:59, Kane Kane <kane.ist...@gmail.com> wrote: > Neha, thanks for the answer, I want to understand what is the case when: > >>Also, bringing down 1 node our of a 3 node zookeeper cluster is risky, > since any subsequent leader election might not reach a quorum > > I was thinking zookeeper guarantees quorum if only 1 node out of 3 fails? > > Thanks. > > On Tue, Jun 24, 2014 at 3:30 PM, Neha Narkhede <neha.narkh...@gmail.com> > wrote: > > See the explanation from the zookeeper folks here > > <https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html> - > > > > " Because Zookeeper requires a majority, it is best to use an odd number > of > > machines. For example, with four machines ZooKeeper can only handle the > > failure of a single machine; if two machines fail, the remaining two > > machines do not constitute a majority. However, with five machines > > ZooKeeper can handle the failure of two machines." > > > > Hope that helps. > > > > > > > > On Tue, Jun 24, 2014 at 12:36 PM, Kane Kane <kane.ist...@gmail.com> > wrote: > > > >> Sorry, i meant 5 nodes in previous question. > >> > >> On Tue, Jun 24, 2014 at 12:36 PM, Kane Kane <kane.ist...@gmail.com> > wrote: > >> > Hello Neha, > >> > > >> >>>ZK cluster of 3 nodes will tolerate the loss of 1 node, but if there > is > >> a > >> > subsequent leader election for any reason, there is a chance that the > >> > cluster does not reach a quorum. It is less likely but still risky to > >> some > >> > extent. > >> > > >> > Does it mean if you have to tolerate 1 node loss without any issues, > >> > you need *at least* 4 nodes? > >> > > >> > On Tue, Jun 24, 2014 at 11:16 AM, Neha Narkhede < > neha.narkh...@gmail.com> > >> wrote: > >> >> Can you elaborate your notion of "smooth"? I thought if you have > >> >> replication factor=3 in this case, you should be able to tolerate > loss > >> >> of a node? > >> >> > >> >> Yes, you should be able to tolerate the loss of a node but if > controlled > >> >> shutdown is not enabled, the delay between loss of the old leader and > >> >> election of the new leader will be longer. > >> >> > >> >> So, you mean ZK cluster of 3 nodes can't tolerate 1 node loss? I've > >> >> seen many recommendations to run 3-nodes cluster, does it mean in > >> >> cluster of 3 you won't be able to operate after loosing 1 node? > >> >> > >> >> ZK cluster of 3 nodes will tolerate the loss of 1 node, but if there > is > >> a > >> >> subsequent leader election for any reason, there is a chance that the > >> >> cluster does not reach a quorum. It is less likely but still risky to > >> some > >> >> extent. > >> >> > >> >> > >> >> On Tue, Jun 24, 2014 at 2:44 AM, Hemath Kumar < > hksrckmur...@gmail.com> > >> >> wrote: > >> >> > >> >>> Yes kane i have the replication factor configured as 3 > >> >>> > >> >>> > >> >>> On Tue, Jun 24, 2014 at 2:42 AM, Kane Kane <kane.ist...@gmail.com> > >> wrote: > >> >>> > >> >>> > Hello Neha, can you explain your statements: > >> >>> > >>Bringing one node down in a cluster will go smoothly only if > your > >> >>> > replication factor is 1 and you enabled controlled shutdown on the > >> >>> brokers. > >> >>> > > >> >>> > Can you elaborate your notion of "smooth"? I thought if you have > >> >>> > replication factor=3 in this case, you should be able to tolerate > >> loss > >> >>> > of a node? > >> >>> > > >> >>> > >>Also, bringing down 1 node our of a 3 node zookeeper cluster is > >> risky, > >> >>> > since any subsequent leader election might not reach a quorum. > >> >>> > > >> >>> > So, you mean ZK cluster of 3 nodes can't tolerate 1 node loss? > I've > >> >>> > seen many recommendations to run 3-nodes cluster, does it mean in > >> >>> > cluster of 3 you won't be able to operate after loosing 1 node? > >> >>> > > >> >>> > Thanks. > >> >>> > > >> >>> > On Mon, Jun 23, 2014 at 9:04 AM, Neha Narkhede < > >> neha.narkh...@gmail.com> > >> >>> > wrote: > >> >>> > > Bringing one node down in a cluster will go smoothly only if > your > >> >>> > > replication factor is 1 and you enabled controlled shutdown on > the > >> >>> > brokers. > >> >>> > > Also, bringing down 1 node our of a 3 node zookeeper cluster is > >> risky, > >> >>> > > since any subsequent leader election might not reach a quorum. > >> Having > >> >>> > said > >> >>> > > that, a partition going offline shouldn't cause a consumer's > >> offset to > >> >>> > > reset to an old value. How did you find out what the consumer's > >> offset > >> >>> > was? > >> >>> > > Do you have your consumer's logs around? > >> >>> > > > >> >>> > > Thanks, > >> >>> > > Neha > >> >>> > > > >> >>> > > > >> >>> > > On Mon, Jun 23, 2014 at 12:28 AM, Hemath Kumar < > >> hksrckmur...@gmail.com > >> >>> > > >> >>> > > wrote: > >> >>> > > > >> >>> > >> We have a 3 node cluster ( 3 kafka + 3 ZK nodes ) . Recently we > >> came > >> >>> > across > >> >>> > >> a strange issue where we wanted to bring one of the node down > from > >> >>> > cluster > >> >>> > >> ( 1 kafka + 1 zookeeper) for doing a maintenance . But the > >> movement we > >> >>> > >> brought it to down on some of the topics ( only some > partitions) > >> >>> > consumers > >> >>> > >> offset is reset some old value. > >> >>> > >> > >> >>> > >> Any reason why this is happened?. As of my knowledge when > brought > >> one > >> >>> > node > >> >>> > >> down its should work smoothly with out any impact. > >> >>> > >> > >> >>> > >> Thanks, > >> >>> > >> Murthy Chelankuri > >> >>> > >> > >> >>> > > >> >>> > >> >