Uma Maheswara Rao G created HDFS-9137:
-----------------------------------------

             Summary: DeadLock between DataNode#refreshVolumes and 
BPOfferService#registrationSucceeded 
                 Key: HDFS-9137
                 URL: https://issues.apache.org/jira/browse/HDFS-9137
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 3.0.0
            Reporter: Uma Maheswara Rao G
            Assignee: Uma Maheswara Rao G


I can see this code flows between DataNode#refreshVolumes and 
BPOfferService#registrationSucceeded could cause deadLock.
In practice situation may be rare as user calling refreshVolumes at the time DN 
registration with NN. But seems like issue can happen.

 Reason for deadLock:

  1) refreshVolumes will be called with DN lock and after at the end it will 
also trigger Block report. In the Block report call, 
BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes 
readLock on bpos.
 DN lock then boos lock

2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and  
calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
bpos lock and then DN lock.

So, this can clearly create dead lock.
I think simple fix could be to move triggerBlockReport call outside out DN lock 
and I feel that call may not be really needed inside DN lock.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to