vamshipasunuru opened a new issue, #14200: URL: https://github.com/apache/hudi/issues/14200
### Bug Description **What happened:** In Hudi-Flink, we observed that when timeline based marker servers were used, the rollback didn't delete all the files that were part of the failed commit. This causes, Hudi to include those files after archival of commits (.inflight && .requested). **What you expected:** 100% of files created by ingestion commit should be deleted. **Steps to reproduce:** 1. Simulate commit failure with flink restart. The ingestion should have generated marker files and partially wrote data files. Timeline will only contain .requested and .inflight 2. Next write of ingestion, will do a clean-up of failed commits. The clean-up finishes without errors but not all data files in the marker directory are deleted. This was evident from the logs in `MarkerBasedRollbackStrategy.getRollbackRequests` count of files read!=count files written during the commit time. We suspect a bug in timeline server contributing to this. ### Environment **Hudi version:** 0.14 **Query engine:** (Spark/Flink/Trino etc) Flink **Relevant configs:** hoodie.cleaner.prewrite.cleaner.policy=rollback_failed_writes hoodie.write.markers.type=TIMELINE_SERVER_BASED ### Logs and Stack Trace _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
