See the explanation from the zookeeper folks here <https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html> -
" Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two machines." Hope that helps. On Tue, Jun 24, 2014 at 12:36 PM, Kane Kane <kane.ist...@gmail.com> wrote: > Sorry, i meant 5 nodes in previous question. > > On Tue, Jun 24, 2014 at 12:36 PM, Kane Kane <kane.ist...@gmail.com> wrote: > > Hello Neha, > > > >>>ZK cluster of 3 nodes will tolerate the loss of 1 node, but if there is > a > > subsequent leader election for any reason, there is a chance that the > > cluster does not reach a quorum. It is less likely but still risky to > some > > extent. > > > > Does it mean if you have to tolerate 1 node loss without any issues, > > you need *at least* 4 nodes? > > > > On Tue, Jun 24, 2014 at 11:16 AM, Neha Narkhede <neha.narkh...@gmail.com> > wrote: > >> Can you elaborate your notion of "smooth"? I thought if you have > >> replication factor=3 in this case, you should be able to tolerate loss > >> of a node? > >> > >> Yes, you should be able to tolerate the loss of a node but if controlled > >> shutdown is not enabled, the delay between loss of the old leader and > >> election of the new leader will be longer. > >> > >> So, you mean ZK cluster of 3 nodes can't tolerate 1 node loss? I've > >> seen many recommendations to run 3-nodes cluster, does it mean in > >> cluster of 3 you won't be able to operate after loosing 1 node? > >> > >> ZK cluster of 3 nodes will tolerate the loss of 1 node, but if there is > a > >> subsequent leader election for any reason, there is a chance that the > >> cluster does not reach a quorum. It is less likely but still risky to > some > >> extent. > >> > >> > >> On Tue, Jun 24, 2014 at 2:44 AM, Hemath Kumar <hksrckmur...@gmail.com> > >> wrote: > >> > >>> Yes kane i have the replication factor configured as 3 > >>> > >>> > >>> On Tue, Jun 24, 2014 at 2:42 AM, Kane Kane <kane.ist...@gmail.com> > wrote: > >>> > >>> > Hello Neha, can you explain your statements: > >>> > >>Bringing one node down in a cluster will go smoothly only if your > >>> > replication factor is 1 and you enabled controlled shutdown on the > >>> brokers. > >>> > > >>> > Can you elaborate your notion of "smooth"? I thought if you have > >>> > replication factor=3 in this case, you should be able to tolerate > loss > >>> > of a node? > >>> > > >>> > >>Also, bringing down 1 node our of a 3 node zookeeper cluster is > risky, > >>> > since any subsequent leader election might not reach a quorum. > >>> > > >>> > So, you mean ZK cluster of 3 nodes can't tolerate 1 node loss? I've > >>> > seen many recommendations to run 3-nodes cluster, does it mean in > >>> > cluster of 3 you won't be able to operate after loosing 1 node? > >>> > > >>> > Thanks. > >>> > > >>> > On Mon, Jun 23, 2014 at 9:04 AM, Neha Narkhede < > neha.narkh...@gmail.com> > >>> > wrote: > >>> > > Bringing one node down in a cluster will go smoothly only if your > >>> > > replication factor is 1 and you enabled controlled shutdown on the > >>> > brokers. > >>> > > Also, bringing down 1 node our of a 3 node zookeeper cluster is > risky, > >>> > > since any subsequent leader election might not reach a quorum. > Having > >>> > said > >>> > > that, a partition going offline shouldn't cause a consumer's > offset to > >>> > > reset to an old value. How did you find out what the consumer's > offset > >>> > was? > >>> > > Do you have your consumer's logs around? > >>> > > > >>> > > Thanks, > >>> > > Neha > >>> > > > >>> > > > >>> > > On Mon, Jun 23, 2014 at 12:28 AM, Hemath Kumar < > hksrckmur...@gmail.com > >>> > > >>> > > wrote: > >>> > > > >>> > >> We have a 3 node cluster ( 3 kafka + 3 ZK nodes ) . Recently we > came > >>> > across > >>> > >> a strange issue where we wanted to bring one of the node down from > >>> > cluster > >>> > >> ( 1 kafka + 1 zookeeper) for doing a maintenance . But the > movement we > >>> > >> brought it to down on some of the topics ( only some partitions) > >>> > consumers > >>> > >> offset is reset some old value. > >>> > >> > >>> > >> Any reason why this is happened?. As of my knowledge when brought > one > >>> > node > >>> > >> down its should work smoothly with out any impact. > >>> > >> > >>> > >> Thanks, > >>> > >> Murthy Chelankuri > >>> > >> > >>> > > >>> >