danny0405 commented on code in PR #13653:
URL: https://github.com/apache/hudi/pull/13653#discussion_r2249062242


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointHelpers.java:
##########
@@ -66,4 +72,28 @@ public static void validateSavepointPresence(HoodieTable 
table, String savepoint
       throw new HoodieRollbackException("No savepoint for instantTime " + 
savepointTime);
     }
   }
+
+  private static class SavepointInstantComparator implements 
Comparator<HoodieInstant> {
+    private final boolean tableVersion8OrLater;
+    private final InstantComparator instantComparator;
+
+    public SavepointInstantComparator(boolean tableVersion8OrLater, 
InstantComparator instantComparator) {
+      this.tableVersion8OrLater = tableVersion8OrLater;
+      this.instantComparator = instantComparator;
+    }
+
+    @Override
+    public int compare(HoodieInstant o1, HoodieInstant o2) {
+      if (tableVersion8OrLater) {
+        return instantComparator.completionTimeOrderedComparator().compare(o1, 
o2);
+      } else {
+        // Do to special handling of compaction instants, we need to use 
requested time based comparator for compaction instants but completion time 
based comparator for others
+        if (o1.getAction().equals(HoodieTimeline.COMMIT_ACTION) || 
o2.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {

Review Comment:
   > In v8, the delta commit is not directly tied to the base file commit time 
so that is why we don't require this. In v8 if we remove the compaction in the 
timeline described above, we can still safely query the table.
   
   This is not true, our assumption for file slice is the newer file slice will 
cover all the dataset in history, if we restore the compaction base files, the 
log files in this file silce will just be kept in the file slice and there is 
no base file to merge for read, then we would got a data loss(unless you keep 
the reqestes compaction metadata file on the timeline but it seems not the 
case).
   
   For example we have
   `t1.dc.req, t1.dc, t2.dc.req, t2.dc, t3.compaction.req, t4.dc.req, t4.dc, 
t5.dc.req, t5.dc, t3.commit.`
   
   Now we want to restore to t5, if we also restore t3.commit for V8 table, the 
file slice that includes t4 logs will only have logs from t4, the history 
dataset in the compaction would be lost.
   
   So we should always use requested time comparison for compactions regardless 
of the table versions.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to