nsivabalan commented on issue #17866: URL: https://github.com/apache/hudi/issues/17866#issuecomment-4042658020
Here is our proposal on how we can go about it. Since this is not a common use-case and is more of an admin operation, we did not want to introduce any public apis to write client. So, based on that, here is what we could do. We will write a separate tool or spark procedure for the purpose. and here are the steps to be executed by the tool. 1. Build FSV to find all latest file slices for the partition of interest. Copy them over to target location. For hdfs, we could even skip this if need be. 2. Issue Delete partition command to hudi for the partitions of interest. 3. Delete the contents of the directory or rename directory based on scheme. In b/w each step, we can update a checkpoint file, so that we can resume from which ever step we left in the previous attempt. For restore: Its recommended to go via standard "insert_overwrite" write operation. From hudi's standpoint: - When cleaner kicks in at some later point in time, it may not find any files to delete only. but our clean execution is resilient to that. So, it is expected to complete the clean (to clean up all replaced file groups) and update metadata table wrt all replaced file groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
