yihua commented on code in PR #18306:
URL: https://github.com/apache/hudi/pull/18306#discussion_r3036089268


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java:
##########
@@ -160,55 +161,69 @@ private Option<Pair<Integer, String>> 
getLatestDeltaCommitInfoSinceLastCompactio
     return Option.empty();
   }
 
+  private Option<Pair<Integer, String>> 
getLatestDeltaCommitInfoSinceLogCompaction() {
+    HoodieActiveTimeline rawActiveTimeline = 
table.getMetaClient().getTableFormat()
+        .getTimelineFactory().createActiveTimeline(table.getMetaClient(), 
false);
+    Option<Pair<HoodieTimeline, HoodieInstant>> deltaCommitsInfo =
+        CompactionUtils.getDeltaCommitsSinceLatestCompletedLogCompaction(
+            table.getActiveTimeline().getDeltaCommitTimeline(), 
rawActiveTimeline);
+    if (deltaCommitsInfo.isPresent()) {
+      return Option.of(Pair.of(
+          deltaCommitsInfo.get().getLeft().countInstants(),
+          deltaCommitsInfo.get().getRight().requestedTime()));
+    }
+    return Option.empty();

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Don't count inflight delta commits toward the log-compaction threshold.**
   
   `getDeltaCommitsSinceLatestCompletedLogCompaction(...)` returns inflight 
deltas too, but this path counts them directly. That makes `needLogCompact()` 
fire a cycle early whenever there's a pending delta after the last completed 
log compaction. Example: log compaction at `08`, completed delta `09`, inflight 
delta `10`, threshold `2` — this returns `2` even though only one 
post-log-compaction delta is committed.
   
   
   <details>
   <summary>Possible fix</summary>
   
   ```diff
      if (deltaCommitsInfo.isPresent()) {
        return Option.of(Pair.of(
   -        deltaCommitsInfo.get().getLeft().countInstants(),
   +        
deltaCommitsInfo.get().getLeft().filterCompletedInstants().countInstants(),
            deltaCommitsInfo.get().getRight().requestedTime()));
      }
   ```
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java`
   around lines 164 - 175, The current 
getLatestDeltaCommitInfoSinceLogCompaction()
   uses deltaCommitsInfo.get().getLeft().countInstants(), which counts inflight
   delta instants too; change it to count only completed instants so inflight
   deltas don't trigger needLogCompact() prematurely — e.g. use the 
HoodieTimeline
   API to count only completed instants (e.g., call
   filterCompletedInstants().countInstants() or iterate getInstants() and count
   instants where isCompleted() is true) on deltaCommitsInfo.get().getLeft() 
before
   returning the pair; keep the rest of the method and callers
   (getLatestDeltaCommitInfoSinceLogCompaction,
   CompactionUtils.getDeltaCommitsSinceLatestCompletedLogCompaction,
   needLogCompact) unchanged.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:1e41a269-6f72-4f34-a3dc-c01a5b2c7e76 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/11#discussion_r3036088837)) 
(source:comment#3036088837)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to