Re: Controlled shutdown failure, retry settings

Jason Rosenberg Tue, 29 Oct 2013 13:30:03 -0700

Here's another exception I see during controlled shutdown (this time there
was not an unclean shutdown problem).  Should I be concerned about this
exception? Is any data loss possible with this?  This one happened after
the first "Retrying controlled shutdown after the previous attempt
failed..." message.  The controlled shutdown subsequently succeeded without
another retry (but with a few more of these exceptions).


Again, there was no "Remaining partitions to move..." message before the
retrying message, so I assume the retry happens after an IOException (which
is not logged in KafkaServer.controlledShutdown).

2013-10-29 20:03:31,883  INFO [kafka-request-handler-4]
controller.ReplicaStateMachine - [Replica state machine on controller 10]:
Invoking state change to OfflineReplica for replicas
PartitionAndReplica(mytopic,0,10)
2013-10-29 20:03:31,883 ERROR [kafka-request-handler-4] change.logger -
Controller 10 epoch 190 initiated state change of replica 10 for partition
[mytopic,0] to OfflineReplica failed
java.lang.AssertionError: assertion failed: Replica 10 for partition
[mytopic,0] should be in the NewReplica,OnlineReplica states before moving
to OfflineReplica state. Instead it is in OfflineReplica state
        at scala.Predef$.assert(Predef.scala:91)
        at
kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:209)
        at
kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:167)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
        at
kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:89)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:199)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:184)
        at scala.Option.foreach(Option.scala:121)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:184)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:180)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
        at
kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:180)
        at
kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:133)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
        at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
        at java.lang.Thread.run(Thread.java:662)

Jason


On Fri, Oct 25, 2013 at 11:51 PM, Jason Rosenberg <j...@squareup.com> wrote:

>
>
> On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
>>
>> Unclean shutdown could result in data loss - since you are moving
>> leadership to a replica that has fallen out of ISR. i.e., it's log end
>> offset is behind the last committed message to this partition.
>>
>>
> But if data is written with 'request.required.acks=-1', no data should be
> lost, no?  Or will partitions be truncated wholesale after an unclean
> shutdown?
>
>
>
>>
>> Take a look at the packaged log4j.properties file. The controller's
>> partition/replica state machines and its channel manager which
>> sends/receives leaderandisr requests/responses to brokers uses a
>> stateChangeLogger. The replica managers on all brokers also use this
>> logger.
>
>
> Ah.....so it looks like most things logged with the stateChangeLogger are
> logged at the TRACE log level.....and that's the default setting in the
> log4j.properties file.  Needless to say, my contained KafkaServer is not
> currently using that log4j.properties (we are just using a rootLogger with
> level = INFO by default).  I can probably add a special rule to use TRACE
> for the state.change.logger category.  However, I'm not sure I can make it
> so that logging all goes to it's own separate log file.....
>
>>
>> Our logging can improve - e.g., it looks like on controller movement
>> we could retry without saying why.
>>
>
> I can file a jira for this, but I'm not sure what it should say!
>

Re: Controlled shutdown failure, retry settings

Reply via email to