Uma Maheswara Rao G created HDFS-9137: -----------------------------------------
Summary: DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded Key: HDFS-9137 URL: https://issues.apache.org/jira/browse/HDFS-9137 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G I can see this code flows between DataNode#refreshVolumes and BPOfferService#registrationSucceeded could cause deadLock. In practice situation may be rare as user calling refreshVolumes at the time DN registration with NN. But seems like issue can happen. Reason for deadLock: 1) refreshVolumes will be called with DN lock and after at the end it will also trigger Block report. In the Block report call, BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes readLock on bpos. DN lock then boos lock 2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and calling dn.bpRegistrationSucceeded which is again synchronized call on DN. bpos lock and then DN lock. So, this can clearly create dead lock. I think simple fix could be to move triggerBlockReport call outside out DN lock and I feel that call may not be really needed inside DN lock. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)