Frank Wong created HUDI-4347:
--------------------------------
Summary: Make a simple method "merge" in HoodieMerge instead of
"preCombine" and "combineAndGetUpdateValue"
Key: HUDI-4347
URL: https://issues.apache.org/jira/browse/HUDI-4347
Project: Apache Hudi
Issue Type: Improvement
Reporter: Frank Wong
Historically, this has been 2 different methods with (potentially) different
semantics:
* {{preCombine}} is de-duplicating the input batch (before inserting it into
the table
* {{combineAndGet}} is used to merge persisted version with the incoming (that
could have been previously de-duplicated
I also don't see a reason for us to get hung up on this historical context and
we should try to unify these historically (potentially) divergent methods into
1 providing a single avenue of merging records either inside a batch (when
de-duping) or when combining persisted one with the incoming.
The merge api should be [associative
operation|https://en.wikipedia.org/wiki/Associative_property]: {{f(a, f(b, c))
= f(f(a, b), c)}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)