[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas reopened HDFS-11576: ---------------------------------- Reopening because {{TestPipelinesFailover}} isn't spurious, this time. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --------------------------------------------------------------------------- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode > Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 > Reporter: Lukas Majercak > Assignee: Lukas Majercak > Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.012.patch, HDFS-11576.013.patch, HDFS-11576.014.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org