cshuo commented on code in PR #13464:
URL: https://github.com/apache/hudi/pull/13464#discussion_r2157928102
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##########
@@ -480,6 +480,13 @@ public static boolean
isLazyFailedWritesCleanPolicy(Configuration conf) {
return
HoodieCleanConfig.FAILED_WRITES_CLEANER_POLICY.defaultValue().equalsIgnoreCase(HoodieFailedWritesCleaningPolicy.LAZY.name());
}
+ /**
+ * Returns whether the writers should use blocking instant time generation.
+ */
+ public static boolean isBlockingInstantGeneration(Configuration conf) {
+ return isCowTable(conf) && isUpsertOperation(conf);
Review Comment:
Writer pipeline for cow with upsert: rowdataToHoodie -> bucket_assign ->
writer
* bucket assign function use state to assign record location.
* writer use merge handle to upsert/merge records into the assigned file
group.
If eager flush / flush triggered by checkpoint happens before previous
instant committed successfully, there are two potential problems:
* if the file group is a new one, exception happens: "FileID xxx of
partition path xxx does not exist."
* if the file group has base file with smaller instant, data loss may happen
because the flushed records will merge with base file with wrong version.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]