kbuci commented on code in PR #13064:
URL: https://github.com/apache/hudi/pull/13064#discussion_r2034000901
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -666,7 +670,7 @@ private void rollbackInflightInstant(HoodieInstant
inflightInstant,
-> entry.getRollbackInstant().getTimestamp())
.orElseGet(HoodieActiveTimeline::createNewInstantTime);
scheduleRollback(context, commitTime, inflightInstant, false,
config.shouldRollbackUsingMarkers(),
- false);
+ false, false);
Review Comment:
> it sounds very restrictive and may break the Flink cleaning workflow, we
may need to skip it for Flink because Flink does not enable MDT in 0.x branch.
Yes good point for clean scheduling we can avoid doing validateTimestamp
check if dataset has no MDT. I would prefer though that we skip if based on
wether or not dataset has MDT rather than wether ingestion uses Flink engine.
Since I'm not sure if there's a straightforward way for clean schedule call to
infer the execution engine used by ingestion
> As for S1, how could the MDT compaction plan being generated when there
are pending instants on DT timeline with smaller timestmap? Should we allow
that.
Oh so in S1 the MDT compaction plan is able to be scheduled since there is
no inflight instant on data table at that point in time (which is
correct/expected behavior). But (without the validateTimestamp check) the other
concurrent clean schedule call on data table can generate a lower timestamp,
which will be the same timestamp used on the MDT write (Since an operation on
data table at instant time `i` will write a corresponding deltacommit to MDT at
with instant time `i`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]