[GitHub] art4ul commented on issue #6608: [FLINK-10203]Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

GitBox Wed, 31 Oct 2018 08:26:29 -0700

art4ul commented on issue #6608: [FLINK-10203]Support truncate method for old 
Hadoop versions in HadoopRecoverableFsDataOutputStream
URL: https://github.com/apache/flink/pull/6608#issuecomment-434730831
 
 
   @kl0u @StephanEwen 
   Hi guys, 
   Regarding your question:
   > - Does HDFS permit to rename to an already existing file name (replacing 
that existing file)?
   
   I've double checked it. HDFS has no ability to move file with overwriting. 
But this Pull request resolves this issue. 
   
    In case of failure after restarting the ‘truncate’ method check if the 
original file exists:
   -   If the original file exists - start the process from the beginning.
   -   If original file not exists but exists the file with '*.truncated' 
extension. The absence of original file tells us about that truncated file was 
written fully and deleted . The source crushed on the stage of renaming the 
truncated file. We can use file with '*.truncated' extension as a resultant 
file and finish the truncation process.
   
   Also, I would like to clarify your idea regarding recoverable writer with 
"Recover for resume" property.
   As far as I understand in this approach if Hadoop version greater than 2.7 
we going to instantiate recoverable writer with native Hadoop truncate logic 
and method supportsResume() should return true. Otherwise, we instantiate 
recoverable writer which never use truncate method (only create new files) and 
the method supportsResume() should return false.
   
   If you ok with this approach I can prepare another pull request. But in this 
case, I need to wait when the logic which check the supportsResume method will 
be implemented.
   Maybe I could help you with it ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] art4ul commented on issue #6608: [FLINK-10203]Support truncate method for old Hadoop versions in HadoopRecoverableFsDataOutputStream

Reply via email to