Dimas Shidqi Parikesit created HDFS-17768:
---------------------------------------------

             Summary: Observer namenode network delay causing empty block 
location for getBatchedListing
                 Key: HDFS-17768
                 URL: https://issues.apache.org/jira/browse/HDFS-17768
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 3.4.1
            Reporter: Dimas Shidqi Parikesit


In our testing with the latest hdfs version (e8a64d0), we found a similar case 
to HDFS-16732 happening in getBatchedListing. During a getBatchedListing, if 
the block report of the observer nn is delayed, one or more of the listing 
results will return blocks without location.

Steps to reproduce this bug:
 # Start a cluster with 1 observer namenode
 # Create an empty file
 # Inject network delay between observer nn and active nn to delay block report 
(or add sleep to the BlockReportProcessingThread of the observer).
 # Append file to add block
 # Send a batchedListPaths request using client API
 # Check that the result has block without location

In HDFS-16732 and HDFS-13924,  a check was added in getBlockLocations, 
getFileInfo, and getListing that checks whether the found blocks have valid 
locations. Missing locations indicate that the observer namenode is not 
up-to-date compared to the active namenode.

We propose to add the same check to getBatchedListing. If any of the 
sub-listing return blocks without location then it will throw 
ObserverRetryOnActiveException and exit the function early. The entire 
batchedListing request will be then retried on active namenode.

Your insights are very much appreciated. We will continue following up this 
issue until it is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to