[jira] [Created] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

Zinan Zhuang (Jira) Tue, 18 Jun 2024 07:54:05 -0700

Zinan Zhuang created HDFS-17553:
-----------------------------------

             Summary: DFSOutputStream.java#closeImpl should have a retry upon 
flushInternal failures
                 Key: HDFS-17553
                 URL: https://issues.apache.org/jira/browse/HDFS-17553
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: dfsclient
    Affects Versions: 3.4.0, 3.3.1
            Reporter: Zinan Zhuang



[HDFS-15865|https://issues.apache.org/jira/browse/HDFS-15865] introduced an 
interrupt in DFSStreamer class to interrupt the 
waitForAckedSeqno call when timeout has exceeded. This method is being used in 
[DFSOutputStream.java#flushInternal 
|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L773]
 , one of whose use case is  
[DFSOutputStream.java#closeImpl|https://github.com/apache/hadoop/blob/branch-3.0.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L870]
 to close a file. 

What we saw was that we were getting more interrupts during the flushInternal 
call when we are closing out a file, which was unhandled by DFSClient and got 
thrown to caller. There's a known issue 
[HDFS-4504|https://issues.apache.org/jira/browse/HDFS-4504] that when a file 
failed to close on HDFS side, the lease got leaked until the DFSClient gets 
recycled. In our HBase setups, DFSClients remain long-lived in each 
regionserver, which means these files remain undead until the regionserver gets 
restarted. 

This issue was observed during datanode decomission because it was stuck on 
open files caused by above leakage. As it's good to close a HDFS file as smooth 
as possible, a retry of flushInternal during closeImpl operations would be 
beneficial to reduce such leakages. 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-17553) DFSOutputStream.java#closeImpl should have a retry upon flushInternal failures

Reply via email to