yihua commented on code in PR #18306:
URL: https://github.com/apache/hudi/pull/18306#discussion_r3036089268
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java:
##########
@@ -160,55 +161,69 @@ private Option<Pair<Integer, String>>
getLatestDeltaCommitInfoSinceLastCompactio
return Option.empty();
}
+ private Option<Pair<Integer, String>>
getLatestDeltaCommitInfoSinceLogCompaction() {
+ HoodieActiveTimeline rawActiveTimeline =
table.getMetaClient().getTableFormat()
+ .getTimelineFactory().createActiveTimeline(table.getMetaClient(),
false);
+ Option<Pair<HoodieTimeline, HoodieInstant>> deltaCommitsInfo =
+ CompactionUtils.getDeltaCommitsSinceLatestCompletedLogCompaction(
+ table.getActiveTimeline().getDeltaCommitTimeline(),
rawActiveTimeline);
+ if (deltaCommitsInfo.isPresent()) {
+ return Option.of(Pair.of(
+ deltaCommitsInfo.get().getLeft().countInstants(),
+ deltaCommitsInfo.get().getRight().requestedTime()));
+ }
+ return Option.empty();
Review Comment:
_⚠️ Potential issue_ | _🟠 Major_
**Don't count inflight delta commits toward the log-compaction threshold.**
`getDeltaCommitsSinceLatestCompletedLogCompaction(...)` returns inflight
deltas too, but this path counts them directly. That makes `needLogCompact()`
fire a cycle early whenever there's a pending delta after the last completed
log compaction. Example: log compaction at `08`, completed delta `09`, inflight
delta `10`, threshold `2` — this returns `2` even though only one
post-log-compaction delta is committed.
<details>
<summary>Possible fix</summary>
```diff
if (deltaCommitsInfo.isPresent()) {
return Option.of(Pair.of(
- deltaCommitsInfo.get().getLeft().countInstants(),
+
deltaCommitsInfo.get().getLeft().filterCompletedInstants().countInstants(),
deltaCommitsInfo.get().getRight().requestedTime()));
}
```
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
```
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java`
around lines 164 - 175, The current
getLatestDeltaCommitInfoSinceLogCompaction()
uses deltaCommitsInfo.get().getLeft().countInstants(), which counts inflight
delta instants too; change it to count only completed instants so inflight
deltas don't trigger needLogCompact() prematurely — e.g. use the
HoodieTimeline
API to count only completed instants (e.g., call
filterCompletedInstants().countInstants() or iterate getInstants() and count
instants where isCompleted() is true) on deltaCommitsInfo.get().getLeft()
before
returning the pair; keep the rest of the method and callers
(getLatestDeltaCommitInfoSinceLogCompaction,
CompactionUtils.getDeltaCommitsSinceLatestCompletedLogCompaction,
needLogCompact) unchanged.
```
</details>
<!--
fingerprinting:phantom:medusa:grasshopper:1e41a269-6f72-4f34-a3dc-c01a5b2c7e76
-->
<!-- This is an auto-generated comment by CodeRabbit -->
— *CodeRabbit*
([original](https://github.com/yihua/hudi/pull/11#discussion_r3036088837))
(source:comment#3036088837)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]