[jira] [Created] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

Sean Chow (JIRA) Mon, 06 May 2019 22:21:13 -0700

Sean Chow created HDFS-14476:
--------------------------------

             Summary: lock too long when fix inconsistent blocks between disk 
and in-memory
                 Key: HDFS-14476
                 URL: https://issues.apache.org/jira/browse/HDFS-14476
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode
    Affects Versions: 2.7.0, 2.6.0
            Reporter: Sean Chow



When directoryScanner have the results of differences between disk and 
in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
`FsDatasetImpl.checkAndUpdate` is a synchronized call

As I have about 6millions blocks for every datanodes and every 6hours' scan 
will have about 25000 abnormal blocks to fix. That leads to a long lock holding 
FsDatasetImpl object.

let's assume every block need 10ms to fix(because of latency of SAS disk), that 
will cost 250 seconds to finish. That means all reads and writes will be 
blocked for 3mins for that datanode.

 

```

2019-05-06 08:06:51,704 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
metadata files:23574, missing block files:23574, missing blocks in 
memory:47625, mismatched blocks:0

...

2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 588402ms to process 1 commands from NN

```

Take long time to process command from nn because threads are blocked. And 
namenode will see long lastContact time for this datanode.

Maybe this affect all hdfs versions.

to fix:

just like process invalidate command from namenode with 1000 batch size, fix 
these abnormal block should be handled with batch too and sleep 2 seconds 
between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

Reply via email to