Any pointers please....
Abhi On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra <abhit...@gmail.com> wrote: > Hi * > > My kafka setup > > > **OS: Windows Machine*6 broker nodes , 4 on one Machine and 2 on other > Machine* > > **ZK instance on (4 broker nodes Machine) and another ZK on (2 broker > nodes machine)* > ** 2 Topics with partition size = 50 and replication factor = 3* > > I am producing on an average of around 500 messages / sec with each > message size close to 98 bytes... > > More or less the message rate stays constant throughout, but after running > the setup for close to 2 weeks , my Kafka cluster broke and this happened > twice in a month. Not able to understand what's the issue, Kafka gurus > please do share your inputs... > > the controlle.log file at the time of Kafka broken looks like > > > > > > > > > > > > > > > > *[2017-04-26 12:03:34,998] INFO [Controller 0]: Broker failure callback > for 0,1,3,5,6 (kafka.controller.KafkaController)[2017-04-26 12:03:34,998] > INFO [Controller 0]: Removed ArrayBuffer() from list of shutting down > brokers. (kafka.controller.KafkaController)[2017-04-26 12:03:34,998] INFO > [Partition state machine on Controller 0]: Invoking state change to > OfflinePartition for partitions > [__consumer_offsets,19],[mytopic,11],[__consumer_offsets,30],[mytopicOLD,18],[mytopic,13],[__consumer_offsets,47],[mytopicOLD,26],[__consumer_offsets,29],[mytopicOLD,0],[__consumer_offsets,41],[mytopic,44],[mytopicOLD,38],[mytopicOLD,2],[__consumer_offsets,17],[__consumer_offsets,10],[mytopic,20],[mytopic,23],[mytopic,30],[__consumer_offsets,14],[__consumer_offsets,40],[mytopic,31],[mytopicOLD,43],[mytopicOLD,19],[mytopicOLD,35],[__consumer_offsets,18],[mytopic,43],[__consumer_offsets,26],[__consumer_offsets,0],[mytopic,32],[__consumer_offsets,24],[mytopicOLD,3],[mytopic,2],[mytopic,3],[mytopicOLD,45],[mytopic,35],[__consumer_offsets,20],[mytopic,1],[mytopicOLD,33],[__consumer_offsets,5],[mytopicOLD,47],[__consumer_offsets,22],[mytopicOLD,8],[mytopic,33],[mytopic,36],[mytopicOLD,11],[mytopic,47],[mytopicOLD,20],[mytopic,48],[__consumer_offsets,12],[mytopicOLD,32],[__consumer_offsets,8],[mytopicOLD,39],[mytopicOLD,27],[mytopicOLD,49],[mytopicOLD,42],[mytopic,21],[mytopicOLD,31],[mytopic,29],[__consumer_offsets,23],[mytopicOLD,21],[__consumer_offsets,48],[__consumer_offsets,11],[mytopic,18],[__consumer_offsets,13],[mytopic,45],[mytopic,5],[mytopicOLD,25],[mytopic,6],[mytopicOLD,23],[mytopicOLD,37],[__consumer_offsets,6],[__consumer_offsets,49],[mytopicOLD,13],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[mytopic,12],[mytopicOLD,30],[__consumer_offsets,31],[__consumer_offsets,44],[mytopicOLD,15],[mytopicOLD,29],[mytopic,37],[mytopic,38],[__consumer_offsets,42],[mytopic,27],[mytopic,26],[mytopic,15],[__consumer_offsets,34],[mytopic,42],[__consumer_offsets,46],[mytopic,14],[mytopicOLD,12],[mytopicOLD,1],[mytopic,7],[__consumer_offsets,25],[mytopicOLD,24],[mytopicOLD,44],[mytopicOLD,14],[__consumer_offsets,32],[mytopic,0],[__consumer_offsets,43],[mytopic,39],[mytopicOLD,5],[mytopic,9],[mytopic,24],[__consumer_offsets,36],[mytopic,25],[mytopicOLD,36],[mytopic,19],[__consumer_offsets,35],[__consumer_offsets,7],[mytopic,8],[__consumer_offsets,38],[mytopicOLD,48],[mytopicOLD,9],[__consumer_offsets,1],[mytopicOLD,6],[mytopic,41],[mytopicOLD,41],[mytopicOLD,7],[mytopic,17],[mytopicOLD,17],[mytopic,49],[__consumer_offsets,16],[__consumer_offsets,2] > (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,045] INFO > [SessionExpirationListener on 1], ZK expired; shut down all controller > components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener)[2017-04-26 > 12:03:35,045] DEBUG [Controller 1]: Controller resigning, broker id 1 > (kafka.controller.KafkaController)[2017-04-26 12:03:35,045] DEBUG > [Controller 1]: De-registering IsrChangeNotificationListener > (kafka.controller.KafkaController)[2017-04-26 12:03:35,060] INFO [Partition > state machine on Controller 1]: Stopped partition state machine > (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,060] INFO > [Replica state machine on controller 1]: Stopped replica state machine > (kafka.controller.ReplicaStateMachine)[2017-04-26 12:03:35,060] INFO > [Controller 1]: Broker 1 resigned as the controller > (kafka.controller.KafkaController)[2017-04-26 12:03:36,013] DEBUG > [OfflinePartitionLeaderSelector]: No broker in ISR is alive for > [__consumer_offsets,19]. Pick the leader from the alive assigned replicas: > (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 12:03:36,029] > DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for > [mytopic,11]. Pick the leader from the alive assigned replicas: > (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 12:03:36,029] > DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for > [__consumer_offsets,30]. Pick the leader from the alive assigned replicas: > (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 12:03:37,811] > DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for > [mytopicOLD,18]. Select 2 from ISR 2 to be the leader. > (kafka.controller.OfflinePartitionLeaderSelector)* > > Typical broker config attached.. Please do share some valid inputs... > > Abhi > !wq > > > *-- * > If you can't succeed, call it version 1.0 > -- If you can't succeed, call it version 1.0