Here's another exception I see during controlled shutdown (this time there was not an unclean shutdown problem). Should I be concerned about this exception? Is any data loss possible with this? This one happened after the first "Retrying controlled shutdown after the previous attempt failed..." message. The controlled shutdown subsequently succeeded without another retry (but with a few more of these exceptions).
Again, there was no "Remaining partitions to move..." message before the retrying message, so I assume the retry happens after an IOException (which is not logged in KafkaServer.controlledShutdown). 2013-10-29 20:03:31,883 INFO [kafka-request-handler-4] controller.ReplicaStateMachine - [Replica state machine on controller 10]: Invoking state change to OfflineReplica for replicas PartitionAndReplica(mytopic,0,10) 2013-10-29 20:03:31,883 ERROR [kafka-request-handler-4] change.logger - Controller 10 epoch 190 initiated state change of replica 10 for partition [mytopic,0] to OfflineReplica failed java.lang.AssertionError: assertion failed: Replica 10 for partition [mytopic,0] should be in the NewReplica,OnlineReplica states before moving to OfflineReplica state. Instead it is in OfflineReplica state at scala.Predef$.assert(Predef.scala:91) at kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:209) at kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:167) at kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89) at kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89) at scala.collection.immutable.Set$Set1.foreach(Set.scala:81) at kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:89) at kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:199) at kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:184) at scala.Option.foreach(Option.scala:121) at kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:184) at kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:180) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43) at kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:180) at kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:133) at kafka.server.KafkaApis.handle(KafkaApis.scala:72) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) at java.lang.Thread.run(Thread.java:662) Jason On Fri, Oct 25, 2013 at 11:51 PM, Jason Rosenberg <j...@squareup.com> wrote: > > > On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > >> >> Unclean shutdown could result in data loss - since you are moving >> leadership to a replica that has fallen out of ISR. i.e., it's log end >> offset is behind the last committed message to this partition. >> >> > But if data is written with 'request.required.acks=-1', no data should be > lost, no? Or will partitions be truncated wholesale after an unclean > shutdown? > > > >> >> Take a look at the packaged log4j.properties file. The controller's >> partition/replica state machines and its channel manager which >> sends/receives leaderandisr requests/responses to brokers uses a >> stateChangeLogger. The replica managers on all brokers also use this >> logger. > > > Ah.....so it looks like most things logged with the stateChangeLogger are > logged at the TRACE log level.....and that's the default setting in the > log4j.properties file. Needless to say, my contained KafkaServer is not > currently using that log4j.properties (we are just using a rootLogger with > level = INFO by default). I can probably add a special rule to use TRACE > for the state.change.logger category. However, I'm not sure I can make it > so that logging all goes to it's own separate log file..... > >> >> Our logging can improve - e.g., it looks like on controller movement >> we could retry without saying why. >> > > I can file a jira for this, but I'm not sure what it should say! >