parisni commented on issue #6373: URL: https://github.com/apache/hudi/issues/6373#issuecomment-1218625262
Yeah KEEP_LATEST_COMMITS. Since cleaning never find files to delete it always fallback into getPartitionPathsForFullCleaning(). But that method looks for path on disk, however it then looks for filegroup to delete in metadata table . Also I guess there is a problem to use incremental cleaning together with KEEP_LATEST_COMMITS which lead to never clean some partitions after a first clean but I will open a separate issue for this one. Incremental cleaning shall be use together withKEEP_LATEST_FILE_VERSIONS only On August 17, 2022 10:45:32 PM UTC, Sivabalan Narayanan ***@***.***> wrote: >may I know what cleaning policy you are using? I see that for KEEP_LATEST_FILE_VERSIONS, we call getPartitionPathsForFullCleaning() within which we use file system based listing and not metadata table based listing. > >and if you are using KEEP_LATEST_COMMITS, within incremental clean mode enabled, if there is no prior clean ever, we trigger getPartitionPathsForFullCleaning() (within which we use file system based listing and not metadata table based listing). > >If not for these, we should be hitting only metadata based listing. Can you confirm which one among the above is your case. > > >-- >Reply to this email directly or view it on GitHub: >https://github.com/apache/hudi/issues/6373#issuecomment-1218565541 >You are receiving this because you authored the thread. > >Message ID: ***@***.***> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
