suryaprasanna commented on issue #5223:
URL: https://github.com/apache/hudi/issues/5223#issuecomment-1101721223

   @sharathkola 
   I do not have much context if at all aws is having any custom implementation 
on hudi code.
   
   In the spark DAG, the filter block is showing output rows as 2. Does that 
mean duplicates rows are returned?
   If there are duplicates, then one problem I can think of is, completed 
replacecommit file not having partitionToReplaceFileIds, so, it maybe 
considering all 198 files as valid files even though only 2 of them are valid.
   For further investigation, could you share us the contents of 
20220404094047.commit and 20220404094203.replacecommit files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to