Shangshu Qian created HDFS-17780: ------------------------------------ Summary: The retry logic in IncrementalBlockReport may bypass the configured IBR interval, causing contention on NameNode Key: HDFS-17780 URL: https://issues.apache.org/jira/browse/HDFS-17780 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 3.4.1, 2.10.2 Reporter: Shangshu Qian
In the current IncrementalBlockReportManager.sendIBR(), the IBR is retried if the RPC (blockReceivedAndDeleted) to NN fails. {code:java} void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration, String bpid) throws IOException { // Generate a list of the pending reports for each storage under the lock final StorageReceivedDeletedBlocks[] reports = generateIBRs(); if (reports.length == 0) { // Nothing new to report. return; } // Send incremental block reports to the Namenode outside the lock if (LOG.isDebugEnabled()) { LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports)); } boolean success = false; final long startTime = monotonicNow(); try { namenode.blockReceivedAndDeleted(registration, bpid, reports); success = true; } finally { if (success) { dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime); lastIBR = startTime; } else { // If we didn't succeed in sending the report, put all of the // blocks back onto our queue, but only in the case where we // didn't put something newer in the meantime. putMissing(reports); } } } {code} The retry does not update the `lastIBR` variable, so the failed IBRs will be retried. However, this retry bypasses the configured `dfs.blockreport.incremental.intervalMsec` and will be retied on the next heartbeat because `lastIBR` is not updated. If the `blockReceivedAndDeleted` fails due to the high load on the NameNode, such retry will only make the contention worse, resulting in a feedback loop. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org