taklwu commented on code in PR #7239:
URL: https://github.com/apache/hbase/pull/7239#discussion_r2299337985
##########
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:
##########
@@ -1014,6 +1015,15 @@ private void deleteAllBackupWALFiles(Configuration conf,
String backupWalDir)
System.out.println("Deleted all contents under WAL directory: " +
walDir);
}
+ // Delete contents under bulk load directory
+ if (fs.exists(bulkloadDir)) {
+ FileStatus[] bulkContents = fs.listStatus(bulkloadDir);
+ for (FileStatus item : bulkContents) {
+ fs.delete(item.getPath(), true); // recursive delete of each child
+ }
+ System.out.println("Deleted all contents under Bulk Load directory:
" + bulkloadDir);
Review Comment:
if the required directory isn't exist, e.g. the first using backup, what
would that be ? will we create it?
the problem is that this list and per-call in s3 could be experience ,
that's why S3A has the feature of `fs.s3a.multiobjectdelete.enable` (with
fs.delete path recursively), if we do this way here, and if the list of
bulkloaded files are huge within the bulkload directory, you will hit s3
throttling very easily, and we will need to write/folk the same implementation
here.
so, instead of having the same required directory structure in place, I
suggested you handle it as an enable/reenable check for the required directory
structure, and create it to avoid large mount of per-object delete call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]