RE: broker restart problems

ZhuGe Thu, 26 Feb 2015 22:33:14 -0800

Sorry, i found the controller log. it shows some log like this:
[2015-02-16 11:10:03,094] DEBUG [Controller 0]: Removing replica 1 from ISR 2,0 
for partition [wx_rtdc_PageViewData_bolt1,4]. 
(kafka.controller.KafkaController)[2015-02-16 11:10:03,096] WARN [Controller 
0]: Cannot remove replica 1 from ISR of partition 
[wx_rtdc_PageViewData_bolt1,4] since it is not in the ISR. Leader = 2 ; ISR = 
List(2, 0) (kafka.controller.KafkaController)[2015-02-16 11:10:03,096] DEBUG 
[Controller 0]: Removing replica 1 from ISR 0,1,2 for partition 
[wx_rtdc_DownLoadData_bolt2,9]. (kafka.controller.KafkaController)[2015-02-16 
11:10:03,099] INFO [Controller 0]: New leader and ISR for partition 
[wx_rtdc_DownLoadData_bolt2,9] is {"leader":0,"leader_epoch":1,"isr":[0,2]} 
(kafka.controller.KafkaController)[2015-02-16 11:10:03,100] DEBUG [Controller 
0]: Removing replica 1 from ISR 2,0,1 for partition 
[wx_rtdc_EventTrackData_bolt5,2]. (kafka.controller.KafkaController)[2015-02-16 
11:10:03,103] INFO [Controller 0]: New leader and ISR for partition 
[wx_rtdc_EventTrackData_bolt5,2] is {"leader":2,"leader_epoch":1,"isr":[2,0]} 
(kafka.controller.KafkaController)
And after that, there are a lot WARN message like:[2015-02-16 11:10:03,220] 
WARN [Channel manager on controller 0]: Not sending request Name: 
StopReplicaRequest; Version: 0; CorrelationId: 188; ClientId: ; 
DeletePartitions: false; ControllerId: 0; ControllerEpoch: 36; Partitions: 
[wx_rtdc_DownLoadData_bolt1,8] to broker 1, since it is offline. 
(kafka.controller.ControllerChannelManager)[2015-02-16 11:10:03,221] WARN 
[Channel manager on controller 0]: Not sending request Name: 
StopReplicaRequest; Version: 0; CorrelationId: 188; ClientId: ; 
DeletePartitions: false; ControllerId: 0; ControllerEpoch: 36; Partitions: 
[wx_rtdc_StartStat_bolt3,0] to broker 1, since it is offline. 
(kafka.controller.ControllerChannelManager)[2015-02-16 11:10:03,221] WARN 
[Channel manager on controller 0]: Not sending request Name: 
StopReplicaRequest; Version: 0; CorrelationId: 188; ClientId: ; 
DeletePartitions: false; ControllerId: 0; ControllerEpoch: 36; Partitions: 
[wx_rtdc_PageViewData_bolt3,5] to broker 1, since it is offline. 
(kafka.controller.ControllerChannelManager)
And when the broker 1 stat up, it log the message:
[2015-02-16 11:11:04,960] INFO [BrokerChangeListener on Controller 0]: Broker 
change listener fired for path /brokers/ids with children 2,1,0 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)[2015-02-16 
11:11:04,969] INFO [BrokerChangeListener on Controller 0]: Newly added brokers: 
1, deleted brokers: , all live brokers: 2,1,0 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)[2015-02-16 
11:11:04,969] DEBUG [Channel manager on controller 0]: Controller 0 trying to 
connect to broker 1 (kafka.controller.ControllerChannelManager)[2015-02-16 
11:11:04,970] INFO [Controller-0-to-broker-1-send-thread], Controller 0 
connected to id:1,host:kslave2,port:9092 for sending state change requests 
(kafka.controller.RequestSendThread)[2015-02-16 11:11:04,970] INFO [Controller 
0]: New broker startup callback for 1 
(kafka.controller.KafkaController)[2015-02-16 11:11:04,970] INFO 
[Controller-0-to-broker-1-send-thread], Starting  
(kafka.controller.RequestSendThread)[2015-02-16 11:11:04,978] ERROR 
[BrokerChangeListener on Controller 0]: Error while handling broker changes 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)java.util.NoSuchElementException:
 key not found: [cjy_kafkawebservice_AirTicketHbaseData_New,0]        at 
scala.collection.MapLike$class.default(MapLike.scala:225)        at 
scala.collection.mutable.HashMap.default(HashMap.scala:45)        at 
scala.collection.MapLike$class.apply(MapLike.scala:135)        at 
scala.collection.mutable.HashMap.apply(HashMap.scala:45)        at 
kafka.controller.ControllerBrokerRequestBatch.updateMetadataRequestMapFor$1(ControllerChannelManager.scala:242)
        at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$addUpdateMetadataRequestForBrokers$1.apply(ControllerChannelManager.scala:268)
        at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$addUpdateMetadataRequestForBrokers$1.apply(ControllerChannelManager.scala:268)
        at 
scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:99)
        at 
scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:99)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)        
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)      
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) 
       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)       
 at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:99)        
at 
kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers(ControllerChannelManager.scala:268)
        at 
kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:940)
        at 
kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:386)     
   at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:342)
        at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
        at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:330)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)        at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:329)
        at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
        at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:328)
        at kafka.utils.Utils$.inLock(Utils.scala:538)        at 
kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:327)
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)        at 
org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)


Thank you very much
> From: t...@outlook.com
> To: users@kafka.apache.org
> Subject: RE: broker restart problems
> Date: Fri, 27 Feb 2015 02:14:28 +0000
> 
> The version of kafka is 0.8.1.1.
> The controller log shows the controller starting up complete with no warn or 
> error message. And we could not find the state-change log belong to the day 
> we restart the broker. I found some error message in  broker 0's log when i 
> restart the broker 2. I am not sure whether it is related or not.
> [2015-02-16 11:10:11,869] INFO Partition [wx_rtdc_EventTrackData_bolt2,8] on 
> broker 0: Shrinking ISR for partition [wx_rtdc_EventTrackData_bolt2,8] from 
> 0,2,1 to 0,2 (kafka.cluster.Partition)[2015-02-16 11:10:11,878] ERROR 
> Conditional update of path 
> /brokers/topics/wx_rtdc_EventTrackData_bolt2/partitions/8/state with data 
> {"controller_epoch":36,"leader":0,"version":1,"leader_epoch":0,"isr":[0,2]} 
> and expected version 0 failed due to 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for 
> /brokers/topics/wx_rtdc_EventTrackData_bolt2/partitions/8/state 
> (kafka.utils.ZkUtils$)[2015-02-16 11:10:11,878] INFO Partition 
> [wx_rtdc_EventTrackData_bolt2,8] on broker 0: Cached zkVersion [0] not equal 
> to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> Thanks> Date: Thu, 26 Feb 2015 13:11:24 -0800
> > Subject: Re: broker restart problems
> > From: j...@confluent.io
> > To: users@kafka.apache.org
> > 
> > Right, you need to look into why the restarted broker is not sync-ed up.
> > Any error in the controller and state-change log? Also, what version of
> > Kafka are you on?
> > 
> > Thanks,
> > 
> > Jun
> > 
> > On Wed, Feb 25, 2015 at 5:46 PM, ZhuGe <t...@outlook.com> wrote:
> > 
> > > we did not have this setting in the property file, so it should be false.
> > > BTW, it this command means periodically invoking 'preferred replica leader
> > > election tool'?and how should i solve the "out of syn" problem of the
> > > broker?
> > > > Date: Wed, 25 Feb 2015 16:09:42 -0800
> > > > Subject: Re: broker restart problems
> > > > From: kzak...@netzero.net
> > > > To: users@kafka.apache.org
> > > >
> > > > Do you have the property auto.leader.rebalance.enable=true set in
> > > brokers?
> > > >
> > > > Thanks
> > > > -Zakee
> > > >
> > > > On Tue, Feb 24, 2015 at 11:47 PM, ZhuGe <t...@outlook.com> wrote:
> > > >
> > > > > Hi all:We have a cluster of 3 brokers(id : 0,1,2). We restart(simply
> > > use
> > > > > stop.sh and start.sh in bin directory) broker 1. The broker started
> > > > > successfully. However, all the partitions' leader moved to other
> > > brokers
> > > > > and no data were written into broker 2. This is the status of one
> > > > > topic:Topic:wx_rtdc_flumesinks        PartitionCount:12
> > > > >  ReplicationFactor:3     Configs:        Topic: wx_rtdc_flumesinks
> > > > >  Partition: 0    Leader: 2       Replicas: 1,2,0 Isr: 2,0        
> > > > > Topic:
> > > > > wx_rtdc_flumesinks       Partition: 1    Leader: 2       Replicas:
> > > 2,0,1
> > > > > Isr: 2,0        Topic: wx_rtdc_flumesinks       Partition: 2
> > > Leader: 0
> > > > >      Replicas: 0,1,2 Isr: 0,2        Topic: wx_rtdc_flumesinks
> > > > >  Partition: 3    Leader: 0       Replicas: 1,0,2 Isr: 0,2        
> > > > > Topic:
> > > > > wx_rtdc_flumesinks       Partition: 4    Leader: 2       Replicas:
> > > 2,1,0
> > > > > Isr: 2,0        Topic: wx_rtdc_flumesinks       Partition: 5
> > > Leader: 0
> > > > >      Replicas: 0,2,1 Isr: 0,2        Topic: wx_rtdc_flumesinks
> > > > >  Partition: 6    Leader: 2       Replicas: 1,2,0 Isr: 2,0        
> > > > > Topic:
> > > > > wx_rtdc_flumesinks       Partition: 7    Leader: 2       Replicas:
> > > 2,0,1
> > > > > Isr: 2,0        Topic: wx_rtdc_flumesinks       Partition: 8
> > > Leader: 0
> > > > >      Replicas: 0,1,2 Isr: 0,2        Topic: wx_rtdc_flumesinks
> > > > >  Partition: 9    Leader: 0       Replicas: 1,0,2 Isr: 0,2        
> > > > > Topic:
> > > > > wx_rtdc_flumesinks       Partition: 10   Leader: 2       Replicas:
> > > 2,1,0
> > > > > Isr: 2,0        Topic: wx_rtdc_flumesinks       Partition: 11
> > >  Leader: 0
> > > > >      Replicas: 0,2,1 Isr: 0,2
> > > > > It seems the broker is out of synchronize with other brokers.  and
> > > nothing
> > > > > changed after i run preferred replication leader election tool.  i
> > > think it
> > > > > is because the preferred replication is not in Isr, which is described
> > > in
> > > > > the wiki of replication tool.
> > > > > I want to know how to synchronize the replications of 3 brokers so 
> > > > > that
> > > > > the broker 1 could work properly.any help would be appreciated.
> > > > > ____________________________________________________________
> > > > > High School Yearbooks
> > > > > View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> > > > >
> > > http://thirdpartyoffers.netzero.net/TGL3255/54ed7fb69616f7fb61578mp03duc
> > >
> > >
>

RE: broker restart problems

Reply via email to