Yiqun Lin created HDFS-13768:
--------------------------------

             Summary:  Adding replicas to volume map makes DataNode start 
slowly 
                 Key: HDFS-13768
                 URL: https://issues.apache.org/jira/browse/HDFS-13768
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 3.1.0
            Reporter: Yiqun Lin


We find DN starting so slowly when rolling upgrade our cluster. When we restart 
DNs, the DNs start so slowly and not register to NN immediately. And this cause 
a lots of following error:
{noformat}
DataXceiver error processing WRITE_BLOCK operation  src: /xx.xx.xx.xx:64360 
dst: /xx.xx.xx.xx:50010
java.io.IOException: Not ready to serve the block pool, 
BP-1508644862-xx.xx.xx.xx-1493781183457.
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

Looking into the logic of DN startup, it will do the initial block pool 
operation before the registration. And during initializing block pool 
operation, we found the adding replicas to volume map is the most expensive 
operation.  Related log:
{noformat}
2018-07-26 10:46:23,771 INFO [Thread-105] 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
volume /home/hard_disk/1/dfs/dn/current: 242722ms
2018-07-26 10:46:26,231 INFO [Thread-109] 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
volume /home/hard_disk/5/dfs/dn/current: 245182ms
2018-07-26 10:46:32,146 INFO [Thread-112] 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
volume /home/hard_disk/8/dfs/dn/current: 251097ms
2018-07-26 10:47:08,283 INFO [Thread-106] 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
volume /home/hard_disk/2/dfs/dn/current: 287235ms
{noformat}

Currently DN uses independent thread to scan and add replica for each volume, 
but we still need to wait the slowest thread to finish its work. So the main 
problem here is that we could make the thread to run faster.

The jstack we get when DN blocking in the adding replica:
{noformat}
"Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da 
runnable [0x00007f4043a38000]
   java.lang.Thread.State: RUNNABLE
        at java.io.UnixFileSystem.list(Native Method)
        at java.io.File.list(File.java:1122)
        at java.io.File.listFiles(File.java:1207)
        at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
{noformat}

One improvement maybe we can use ForkJoinPool to do this recursive task, rather 
than a sync way.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to