Kihwal Lee created HDFS-5497:
--------------------------------

             Summary: Performance may suffer when UnderReplicatedBlocks is used 
heavily
                 Key: HDFS-5497
                 URL: https://issues.apache.org/jira/browse/HDFS-5497
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: namenode
            Reporter: Kihwal Lee


Currently UnderReplicatedBlocks uses LightWeightLinkedSet with the default 
initial size of 16.  If there are a lot of under-replicated blocks, insertion 
and removal can be very expensive.

We see 450K to 1M under-replicated block during start-up, which typically go 
away soon as last few data nodes join. With 450K under-replicated blocks, 
replication queue initialization would re-allocate the underlying array 15 time 
and reinsert elements over 1M times.  As block reports come in, it will go 
through the reverse.  I think this one of the reasons why initial block reports 
after leaving safe mode can take very long time to process.

With a larger initial/minimum size, the timing gets significantly shorter. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to