Hi folks, We did the HDFS namenode swap with the same name but different ip address so the clients don't need to change the configuration. While after that some applications are seeing "Unable to close file because the last block does not have enough number of replicas." and server side sees "org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* * is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 1) in file *".
We were thinking that datanode needs to retry one more time to figure out the right ip address of the active namenode, but the default timeout dfs.client.block.write.locateFollowingBlock.retries:5 dfs.client.block.write.locateFollowingBlock.initial.delay.ms:400 from the client side seems to be sufficient. Want to check if anyone has seen this issue and what would be the possible cause for that. Thanks, Aihua