Frank Wong created HUDI-4347:
--------------------------------

             Summary: Make a simple method "merge" in HoodieMerge instead of 
"preCombine" and "combineAndGetUpdateValue"
                 Key: HUDI-4347
                 URL: https://issues.apache.org/jira/browse/HUDI-4347
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Frank Wong


Historically, this has been 2 different methods with (potentially) different 
semantics:
 * {{preCombine}} is de-duplicating the input batch (before inserting it into 
the table
 * {{combineAndGet}} is used to merge persisted version with the incoming (that 
could have been previously de-duplicated

I also don't see a reason for us to get hung up on this historical context and 
we should try to unify these historically (potentially) divergent methods into 
1 providing a single avenue of merging records either inside a batch (when 
de-duping) or when combining persisted one with the incoming.

 

The merge api should be [associative 
operation|https://en.wikipedia.org/wiki/Associative_property]: {{f(a, f(b, c)) 
= f(f(a, b), c)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to