Shangshu Qian created HDFS-17780:
------------------------------------

             Summary: The retry logic in IncrementalBlockReport may bypass the 
configured IBR interval, causing contention on NameNode
                 Key: HDFS-17780
                 URL: https://issues.apache.org/jira/browse/HDFS-17780
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, namenode
    Affects Versions: 3.4.1, 2.10.2
            Reporter: Shangshu Qian


In the current IncrementalBlockReportManager.sendIBR(), the IBR is retried if 
the RPC (blockReceivedAndDeleted) to NN fails.

 
{code:java}
  void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration,
      String bpid) throws IOException {
    // Generate a list of the pending reports for each storage under the lock
    final StorageReceivedDeletedBlocks[] reports = generateIBRs();
    if (reports.length == 0) {
      // Nothing new to report.
      return;
    }    // Send incremental block reports to the Namenode outside the lock
    if (LOG.isDebugEnabled()) {
      LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports));
    }
    boolean success = false;
    final long startTime = monotonicNow();
    try {
      namenode.blockReceivedAndDeleted(registration, bpid, reports);
      success = true;
    } finally {      if (success) {
        dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime);
        lastIBR = startTime;
      } else {
        // If we didn't succeed in sending the report, put all of the
        // blocks back onto our queue, but only in the case where we
        // didn't put something newer in the meantime.
        putMissing(reports);
      }
    }
  } {code}
The retry does not update the `lastIBR` variable, so the failed IBRs will be 
retried. However, this retry bypasses the configured 
`dfs.blockreport.incremental.intervalMsec` and will be retied on the next 
heartbeat because `lastIBR` is not updated.

 

If the `blockReceivedAndDeleted` fails due to the high load on the NameNode, 
such retry will only make the contention worse, resulting in a feedback loop.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to