[GitHub] [hudi] scxwhite commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

GitBox Wed, 16 Mar 2022 02:55:46 -0700


scxwhite commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r827820051




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> 
!fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest 
delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid 
rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       > Kind of got your idea, then i think we should always use the reverse 
order and the comparing sequence in merge reader should also be reversed to 
keep the process time semantics.
   
   yes. did you say it's here?（  
https://github.com/apache/hudi/pull/5030/files#diff-c2f73f1ce4c0687cffa73e96b82514aca3a930ec1a8bc0c2efd73d7cf869c883R150）
   If so, the above has been modified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] scxwhite commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Reply via email to