art4ul commented on issue #6608: [FLINK-10203]Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream URL: https://github.com/apache/flink/pull/6608#issuecomment-434730831 @kl0u @StephanEwen Hi guys, Regarding your question: > - Does HDFS permit to rename to an already existing file name (replacing that existing file)? I've double checked it. HDFS has no ability to move file with overwriting. But this Pull request resolves this issue. In case of failure after restarting the ‘truncate’ method check if the original file exists: - If the original file exists - start the process from the beginning. - If original file not exists but exists the file with '*.truncated' extension. The absence of original file tells us about that truncated file was written fully and deleted . The source crushed on the stage of renaming the truncated file. We can use file with '*.truncated' extension as a resultant file and finish the truncation process. Also, I would like to clarify your idea regarding recoverable writer with "Recover for resume" property. As far as I understand in this approach if Hadoop version greater than 2.7 we going to instantiate recoverable writer with native Hadoop truncate logic and method supportsResume() should return true. Otherwise, we instantiate recoverable writer which never use truncate method (only create new files) and the method supportsResume() should return false. If you ok with this approach I can prepare another pull request. But in this case, I need to wait when the logic which check the supportsResume method will be implemented. Maybe I could help you with it ?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services