the-other-tim-brown commented on code in PR #13444:
URL: https://github.com/apache/hudi/pull/13444#discussion_r2188837358
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/FileGroupRecordBuffer.java:
##########
@@ -565,27 +570,62 @@ protected boolean hasNextBaseRecord(T baseRecord,
BufferedRecord<T> logRecordInf
Pair<Boolean, T> isDeleteAndRecord = merge(baseRecordInfo,
logRecordInfo);
if (!isDeleteAndRecord.getLeft()) {
// Updates
- nextRecord = readerContext.seal(isDeleteAndRecord.getRight());
+ nextRecord =
readerContext.seal(applyOutputSchemaConversion(isDeleteAndRecord.getRight()));
Review Comment:
> > The CDC logic is not really part of merging, why should they be coupled?
>
> To make the logic in file group reader buffer clean and more maintainable.
>
That is a fine goal but we should consider that we will now need the logic
in two places, one in the merger and one in the file group reader buffer now
due to the point I have already raised.
> > Please also note that there will be outputs even when there is no
merging in the case of log files with entries that are not in the base files.
>
> That's why I saied `BufferedRecordMerger#finalMerge` instead of the other
two APIs.
The `finalMerge` is never called in this case. `finalMerge` is only used in
the case where there is a record in the base file that is merged with some
records coming from log files.
> In any case, please stop introducing row-level ramifications continuously
if the `cdc` logging is deterministic per-query.
Can you explain what you mean by this please?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/FileGroupRecordBuffer.java:
##########
@@ -565,27 +570,62 @@ protected boolean hasNextBaseRecord(T baseRecord,
BufferedRecord<T> logRecordInf
Pair<Boolean, T> isDeleteAndRecord = merge(baseRecordInfo,
logRecordInfo);
if (!isDeleteAndRecord.getLeft()) {
// Updates
- nextRecord = readerContext.seal(isDeleteAndRecord.getRight());
+ nextRecord =
readerContext.seal(applyOutputSchemaConversion(isDeleteAndRecord.getRight()));
Review Comment:
> > The CDC logic is not really part of merging, why should they be coupled?
>
> To make the logic in file group reader buffer clean and more maintainable.
>
That is a fine goal but we should consider that we will now need the logic
in two places, one in the merger and one in the file group reader buffer now
due to the point I have already raised.
> > Please also note that there will be outputs even when there is no
merging in the case of log files with entries that are not in the base files.
>
> That's why I saied `BufferedRecordMerger#finalMerge` instead of the other
two APIs.
The `finalMerge` is never called in this case. `finalMerge` is only used in
the case where there is a record in the base file that is merged with some
records coming from log files.
> In any case, please stop introducing row-level ramifications continuously
if the `cdc` logging is deterministic per-query.
Can you explain what you mean by this please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]