umehrot2 commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-650478776
Actually this is not just a problem with `Throttling`. AWS S3 can throw intermittent `Throttling` and well as `Internal Errors` which can potentially succeed upon retrying. I wish we were able to use https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html which solved a lot of the S3 related commit problems. The EMR file system will not commit the file until the spark task commit succeeds, essentially making this file commit atomic. Unfortunately, Hudi does not depend on sparks commit mechanisms to be able to leverage this. Yes, I think waiting for 1 attempt and if it does not appear, just skipping (**and not failing the job**) it should work better as an interim solution given the current design of using marker files. @bvaradar also has some thoughts of improving this in the long run. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
