I looked this up yesterday when I read the grandparent, as my old company ran two and I needed to know. Your link is a bit ambiguous but it has a link to the zookeeper Getting Started guide which says this:
" For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers is inherently less stable than a single server, because there are two single points of failure. " <https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html> cheers jan On 30/04/2017, Michal Borowiecki <michal.borowie...@openbet.com> wrote: > Svante, I don't share your opinion. > Having an even number of zookeepers is not a problem in itself, it > simply means you don't get any better resilience than if you had one > fewer instance. > Yes, it's not common or recommended practice, but you are allowed to > have an even number of zookeepers and it's most likely not related to > the problem at hand and does NOT need to be addressed first. > https://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup > > Abhit, I'm afraid the log snippet is not enough for me to help. > Maybe someone else in the community with more experience can recognize > the symptoms but in the meantime, if you haven't already done so, you > may want to search for similar issues: > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22ZK%20expired%3B%20shut%20down%20all%20controller%22 > > searching for text like "ZK expired; shut down all controller" or "No > broker in ISR is alive for" or other interesting events form the log. > > Hope that helps, > Michal > > > On 26/04/17 21:40, Svante Karlsson wrote: >> You are not supposed to run an even number of zookeepers. Fix that first >> >> On Apr 26, 2017 20:59, "Abhit Kalsotra" <abhit...@gmail.com> wrote: >> >>> Any pointers please.... >>> >>> >>> Abhi >>> >>> On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra <abhit...@gmail.com> >>> wrote: >>> >>>> Hi * >>>> >>>> My kafka setup >>>> >>>> >>>> **OS: Windows Machine*6 broker nodes , 4 on one Machine and 2 on other >>>> Machine* >>>> >>>> **ZK instance on (4 broker nodes Machine) and another ZK on (2 broker >>>> nodes machine)* >>>> ** 2 Topics with partition size = 50 and replication factor = 3* >>>> >>>> I am producing on an average of around 500 messages / sec with each >>>> message size close to 98 bytes... >>>> >>>> More or less the message rate stays constant throughout, but after >>> running >>>> the setup for close to 2 weeks , my Kafka cluster broke and this >>>> happened >>>> twice in a month. Not able to understand what's the issue, Kafka gurus >>>> please do share your inputs... >>>> >>>> the controlle.log file at the time of Kafka broken looks like >>>> >>>> >>>> >>>> >>>> *[2017-04-26 12:03:34,998] INFO [Controller 0]: Broker failure callback >>>> for 0,1,3,5,6 (kafka.controller.KafkaController)[2017-04-26 >>> 12:03:34,998] >>>> INFO [Controller 0]: Removed ArrayBuffer() from list of shutting down >>>> brokers. (kafka.controller.KafkaController)[2017-04-26 12:03:34,998] >>> INFO >>>> [Partition state machine on Controller 0]: Invoking state change to >>>> OfflinePartition for partitions >>>> [__consumer_offsets,19],[mytopic,11],[__consumer_ >>> offsets,30],[mytopicOLD,18],[mytopic,13],[__consumer_ >>> offsets,47],[mytopicOLD,26],[__consumer_offsets,29],[ >>> mytopicOLD,0],[__consumer_offsets,41],[mytopic,44],[ >>> mytopicOLD,38],[mytopicOLD,2],[__consumer_offsets,17],[__ >>> consumer_offsets,10],[mytopic,20],[mytopic,23],[mytopic,30], >>> [__consumer_offsets,14],[__consumer_offsets,40],[mytopic, >>> 31],[mytopicOLD,43],[mytopicOLD,19],[mytopicOLD,35] >>> ,[__consumer_offsets,18],[mytopic,43],[__consumer_offsets,26],[__consumer_ >>> offsets,0],[mytopic,32],[__consumer_offsets,24],[ >>> mytopicOLD,3],[mytopic,2],[mytopic,3],[mytopicOLD,45],[ >>> mytopic,35],[__consumer_offsets,20],[mytopic,1],[ >>> mytopicOLD,33],[__consumer_offsets,5],[mytopicOLD,47],[__ >>> consumer_offsets,22],[mytopicOLD,8],[mytopic,33],[ >>> mytopic,36],[mytopicOLD,11],[mytopic,47],[mytopicOLD,20],[ >>> mytopic,48],[__consumer_offsets,12],[mytopicOLD,32],[_ >>> _consumer_offsets,8],[mytopicOLD,39],[mytopicOLD,27] >>> ,[mytopicOLD,49],[mytopicOLD,42],[mytopic,21],[mytopicOLD, >>> 31],[mytopic,29],[__consumer_offsets,23],[mytopicOLD,21],[_ >>> _consumer_offsets,48],[__consumer_offsets,11],[mytopic, >>> 18],[__consumer_offsets,13],[mytopic,45],[mytopic,5],[ >>> mytopicOLD,25],[mytopic,6],[mytopicOLD,23],[mytopicOLD,37] >>> ,[__consumer_offsets,6],[__consumer_offsets,49],[ >>> mytopicOLD,13],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_ >>> offsets,37],[mytopic,12],[mytopicOLD,30],[__consumer_ >>> offsets,31],[__consumer_offsets,44],[mytopicOLD,15],[ >>> mytopicOLD,29],[mytopic,37],[mytopic,38],[__consumer_ >>> offsets,42],[mytopic,27],[mytopic,26],[mytopic,15],[__ >>> consumer_offsets,34],[mytopic,42],[__consumer_offsets,46],[ >>> mytopic,14],[mytopicOLD,12],[mytopicOLD,1],[mytopic,7],[__ >>> consumer_offsets,25],[mytopicOLD,24],[mytopicOLD,44] >>> ,[mytopicOLD,14],[__consumer_offsets,32],[mytopic,0],[__ >>> consumer_offsets,43],[mytopic,39],[mytopicOLD,5],[mytopic,9] >>> ,[mytopic,24],[__consumer_offsets,36],[mytopic,25],[ >>> mytopicOLD,36],[mytopic,19],[__consumer_offsets,35],[__ >>> consumer_offsets,7],[mytopic,8],[__consumer_offsets,38],[ >>> mytopicOLD,48],[mytopicOLD,9],[__consumer_offsets,1],[ >>> mytopicOLD,6],[mytopic,41],[mytopicOLD,41],[mytopicOLD,7], >>> [mytopic,17],[mytopicOLD,17],[mytopic,49],[__consumer_ >>> offsets,16],[__consumer_offsets,2] >>>> (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,045] INFO >>>> [SessionExpirationListener on 1], ZK expired; shut down all controller >>>> components and try to re-elect >>>> (kafka.controller.KafkaController$SessionExpirationListener)[2017-04-26 >>>> 12:03:35,045] DEBUG [Controller 1]: Controller resigning, broker id 1 >>>> (kafka.controller.KafkaController)[2017-04-26 12:03:35,045] DEBUG >>>> [Controller 1]: De-registering IsrChangeNotificationListener >>>> (kafka.controller.KafkaController)[2017-04-26 12:03:35,060] INFO >>> [Partition >>>> state machine on Controller 1]: Stopped partition state machine >>>> (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,060] INFO >>>> [Replica state machine on controller 1]: Stopped replica state machine >>>> (kafka.controller.ReplicaStateMachine)[2017-04-26 12:03:35,060] INFO >>>> [Controller 1]: Broker 1 resigned as the controller >>>> (kafka.controller.KafkaController)[2017-04-26 12:03:36,013] DEBUG >>>> [OfflinePartitionLeaderSelector]: No broker in ISR is alive for >>>> [__consumer_offsets,19]. Pick the leader from the alive assigned >>> replicas: >>>> (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 >>> 12:03:36,029] >>>> DEBUG [OfflinePartitionLeaderSelector]: >>>> [mytopic,11]. Pick the leader from the alive assigned replicas: >>>> (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 >>> 12:03:36,029] >>>> DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for >>>> [__consumer_offsets,30]. Pick the leader from the alive assigned >>> replicas: >>>> (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 >>> 12:03:37,811] >>>> DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for >>>> [mytopicOLD,18]. Select 2 from ISR 2 to be the leader. >>>> (kafka.controller.OfflinePartitionLeaderSelector)* >>>> >>>> Typical broker config attached.. Please do share some valid inputs... >>>> >>>> Abhi >>>> !wq >>>> >>>> >>>> *-- * >>>> If you can't succeed, call it version 1.0 >>>> >>> >>> >>> -- >>> If you can't succeed, call it version 1.0 >>> > > -- > Signature > <http://www.openbet.com/> Michal Borowiecki > Senior Software Engineer L4 > T: +44 208 742 1600 > > > +44 203 249 8448 > > > > E: michal.borowie...@openbet.com > W: www.openbet.com <http://www.openbet.com/> > > > OpenBet Ltd > > Chiswick Park Building 9 > > 566 Chiswick High Rd > > London > > W4 5XT > > UK > > > <https://www.openbet.com/email_promo> > > This message is confidential and intended only for the addressee. If you > have received this message in error, please immediately notify the > postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it > from your system as well as any copies. The content of e-mails as well > as traffic data may be monitored by OpenBet for employment and security > purposes. To protect the environment please do not print this e-mail > unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building > 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company > registered in England and Wales. Registered no. 3134634. VAT no. > GB927523612 > >