eshu commented on issue #8178:
URL: https://github.com/apache/hudi/issues/8178#issuecomment-2985912083

   @nsivabalan I have the similar problem with Hudi 0.13.1. Index type is just 
BLOOM (not GLOBAL_BLOOM). Table type is MOR, operation type is UPSERT.
   
   I created a parallel job which writes the same data for this month, but this 
fresh parallel do not have any duplicates. I think the old dataset is somehow 
corrupted, and there is a bug in Hudi which produces duplicates.
   
   The old dataset is huge, so reingestion will be expensive and will take too 
much time. Migration to higher versions of Hudi is not possible ATM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to