vinothchandar commented on a change in pull request #1768:
URL: https://github.com/apache/hudi/pull/1768#discussion_r462994647
##########
File path: hudi-client/src/main/java/org/apache/hudi/table/MarkerFiles.java
##########
@@ -64,44 +69,87 @@ public MarkerFiles(HoodieTable<?> table, String
instantTime) {
instantTime);
}
- public void quietDeleteMarkerDir() {
+ public void quietDeleteMarkerDir(JavaSparkContext jsc, int parallelism) {
try {
- deleteMarkerDir();
+ deleteMarkerDir(jsc, parallelism);
} catch (HoodieIOException ioe) {
LOG.warn("Error deleting marker directory for instant " + instantTime,
ioe);
}
}
/**
* Delete Marker directory corresponding to an instant.
+ *
+ * @param jsc Java Spark Context
+ * @param parallelism Spark parallelism for deletion
*/
- public boolean deleteMarkerDir() {
+ public boolean deleteMarkerDir(JavaSparkContext jsc, int parallelism) {
try {
- boolean result = fs.delete(markerDirPath, true);
- if (result) {
+ if (fs.exists(markerDirPath)) {
+ FileStatus[] fileStatuses = fs.listStatus(markerDirPath);
+ List<String> markerDirSubPaths = Arrays.stream(fileStatuses)
+ .map(fileStatus -> fileStatus.getPath().toString())
+ .collect(Collectors.toList());
+
+ if (markerDirSubPaths.size() > 0) {
+ SerializableConfiguration conf = new
SerializableConfiguration(fs.getConf());
+ parallelism = Math.min(markerDirSubPaths.size(), parallelism);
+ jsc.parallelize(markerDirSubPaths, parallelism).foreach(subPathStr
-> {
+ Path subPath = new Path(subPathStr);
+ FileSystem fileSystem = subPath.getFileSystem(conf.get());
+ fileSystem.delete(subPath, true);
Review comment:
note to self: this will still work when `subPath` is a file. i.e
non-partitioned tables
##########
File path: hudi-client/src/main/java/org/apache/hudi/table/MarkerFiles.java
##########
@@ -110,6 +158,10 @@ private String translateMarkerToDataPath(String
markerPath) {
return MarkerFiles.stripMarkerSuffix(rPath);
}
+ public static String stripMarkerSuffix(String path) {
Review comment:
nit: better to have the static methods right at the top?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]