Re: Controlled shutdown failure, retry settings

Jason Rosenberg Tue, 29 Oct 2013 13:41:03 -0700

I've filed: https://issues.apache.org/jira/browse/KAFKA-1108



On Tue, Oct 29, 2013 at 4:29 PM, Jason Rosenberg <j...@squareup.com> wrote:

> Here's another exception I see during controlled shutdown (this time there
> was not an unclean shutdown problem).  Should I be concerned about this
> exception? Is any data loss possible with this?  This one happened after
> the first "Retrying controlled shutdown after the previous attempt
> failed..." message.  The controlled shutdown subsequently succeeded without
> another retry (but with a few more of these exceptions).
>
> Again, there was no "Remaining partitions to move..." message before the
> retrying message, so I assume the retry happens after an IOException (which
> is not logged in KafkaServer.controlledShutdown).
>
> 2013-10-29 20:03:31,883  INFO [kafka-request-handler-4]
> controller.ReplicaStateMachine - [Replica state machine on controller 10]:
> Invoking state change to OfflineReplica for replicas
> PartitionAndReplica(mytopic,0,10)
> 2013-10-29 20:03:31,883 ERROR [kafka-request-handler-4] change.logger -
> Controller 10 epoch 190 initiated state change of replica 10 for partition
> [mytopic,0] to OfflineReplica failed
> java.lang.AssertionError: assertion failed: Replica 10 for partition
> [mytopic,0] should be in the NewReplica,OnlineReplica states before moving
> to OfflineReplica state. Instead it is in OfflineReplica state
>         at scala.Predef$.assert(Predef.scala:91)
>         at
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:209)
>         at
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:167)
>         at
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
>         at
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
>         at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
>         at
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:89)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:199)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:184)
>         at scala.Option.foreach(Option.scala:121)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:184)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:180)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
>         at
> kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:180)
>         at
> kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:133)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
>         at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
>         at java.lang.Thread.run(Thread.java:662)
>
> Jason
>
>
> On Fri, Oct 25, 2013 at 11:51 PM, Jason Rosenberg <j...@squareup.com>wrote:
>
>>
>>
>> On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>>
>>>
>>> Unclean shutdown could result in data loss - since you are moving
>>> leadership to a replica that has fallen out of ISR. i.e., it's log end
>>> offset is behind the last committed message to this partition.
>>>
>>>
>> But if data is written with 'request.required.acks=-1', no data should be
>> lost, no?  Or will partitions be truncated wholesale after an unclean
>> shutdown?
>>
>>
>>
>>>
>>> Take a look at the packaged log4j.properties file. The controller's
>>> partition/replica state machines and its channel manager which
>>> sends/receives leaderandisr requests/responses to brokers uses a
>>> stateChangeLogger. The replica managers on all brokers also use this
>>> logger.
>>
>>
>> Ah.....so it looks like most things logged with the stateChangeLogger are
>> logged at the TRACE log level.....and that's the default setting in the
>> log4j.properties file.  Needless to say, my contained KafkaServer is not
>> currently using that log4j.properties (we are just using a rootLogger with
>> level = INFO by default).  I can probably add a special rule to use TRACE
>> for the state.change.logger category.  However, I'm not sure I can make it
>> so that logging all goes to it's own separate log file.....
>>
>>>
>>> Our logging can improve - e.g., it looks like on controller movement
>>> we could retry without saying why.
>>>
>>
>> I can file a jira for this, but I'm not sure what it should say!
>>
>
>

Re: Controlled shutdown failure, retry settings

Reply via email to