umehrot2 commented on issue #1764:
URL: https://github.com/apache/hudi/issues/1764#issuecomment-649914448


   @vinothchandar @bvaradar looking at the logic we are forming the list of 
invalid data file paths to be deleted from the marker file paths. One possible 
reason that seems to me can be that marker file got created but corresponding 
data file was never written by spark because failure happened before the file 
was written. Now we are expecting that file to appear, but it was never created 
in the first place. Do you guys think its possible ? I will also dive more into 
the marker file code to understand.
   
   On a similar note regarding handling of marker files, I have narrowed down 
some performance issues with S3 in the marker files clean up code. 
https://issues.apache.org/jira/browse/HUDI-1054 @zuyanton might be of interest 
to you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to