[
https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624461#comment-15624461
]
Json Tu edited comment on KAFKA-4360 at 11/1/16 5:50 AM:
---------------------------------------------------------
it is wonderful,I search onControllerResignation() in kafka codes. just as you
say,there are two other invokes in ZookeeperLeaderElector,can you assign this
task to me,I very pleased to put a pull request for it,thank you
was (Author: json tu):
it is wonderful,I search onControllerResignation() in kafka codes. just as you
say there are two other invokes in ZookeeperLeaderElector,can you assign this
task to me,I very pleased to put a pull request for it,thank you
> Controller may deadLock when autoLeaderRebalance encounter zk expired
> ---------------------------------------------------------------------
>
> Key: KAFKA-4360
> URL: https://issues.apache.org/jira/browse/KAFKA-4360
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Reporter: Json Tu
> Labels: bugfix
> Attachments: deadlock_patch, yf-mafka2-common02_jstack.txt
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> when controller has checkAndTriggerPartitionRebalance task in
> autoRebalanceScheduler,and then zk expired at that time. It will
> run into deadlock.
> we can restore the scene as below,when zk session expired,zk thread will call
> handleNewSession which defined in SessionExpirationListener, and it will get
> controllerContext.controllerLock,and then it will
> autoRebalanceScheduler.shutdown(),which need complete all the task in the
> autoRebalanceScheduler,but that threadPoll also need get
> controllerContext.controllerLock,but it has already owned by zk callback
> thread,which will then run into deadlock.
> because of that,it will cause two problems at least, first is the broker’s id
> is cannot register to the zookeeper,and it will be considered as dead by new
> controller,second this procedure can not be stop by kafka-server-stop.sh,
> because shutdown function
> can not get controllerContext.controllerLock also, we cannot shutdown kafka
> except using kill -9.
> In my attachment, I upload a jstack file, which was created when my kafka
> procedure cannot shutdown by kafka-server-stop.sh.
> I have met this scenes for several times,I think this may be a bug that not
> solved in kafka.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)