parisni commented on issue #6531:
URL: https://github.com/apache/hudi/issues/6531#issuecomment-1230429062

   `hoodie.merge.allow.duplicate.on.inserts=true` fixes the problem.
   
   BTW, I suggest to update the documentation:
   
   
   INSERT
   This operation is very similar to upsert in terms of heuristics/file sizing 
but completely skips the index lookup step. Thus, it can be a lot faster than 
upserts for use-cases like log de-duplication (in conjunction with options to 
filter duplicates mentioned below). **Still duplicates are usually merged by 
default with hoodie.merge.allow.duplicate.on.inserts=false** This is also 
suitable for use-cases where the table can tolerate duplicates, but just need 
the transactional writes/incremental pull/storage management capabilities of 
Hudi.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to