[jira] [Created] (HDFS-8869) Don't mark storages as failed before first block report

Rushabh S Shah (JIRA) Thu, 06 Aug 2015 14:29:04 -0700

Rushabh S Shah created HDFS-8869:
------------------------------------

             Summary: Don't mark storages as failed before first block report
                 Key: HDFS-8869
                 URL: https://issues.apache.org/jira/browse/HDFS-8869
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.7.0
            Reporter: Rushabh S Shah
            Assignee: Daryn Sharp



Creating this ticket on behalf of [~daryn].

Heartbeat processing performs the failed storage check. The DN reports its 
storages and any prior missing storages, ex. unique storage id upgrade, are 
marked failed. The heartbeat monitor removes all blocks associated to the 
failed storage. A replication storm ensues for all blocks on the node.

Eventually the DN block reports for the new storages - up to 15m later for 
large clusters. Now the NN has many excess blocks to invalidate. If the cluster 
has failed over in the past 24h, ex. rolling upgrade, the standby gone active 
will queue the block invalidations which triggers the severe performance 
degradation of HDFS-8674 which has been greatly lessened but is still an issue.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8869) Don't mark storages as failed before first block report

Reply via email to