haripriyarhp commented on issue #6166:
URL: https://github.com/apache/hudi/issues/6166#issuecomment-1199226564

   @rmahindra123 :  Unfortunately, I am not able to share the .hoodie folder. 
Just to add, yesterday I tried it out again. I sent messages to a topic in 
batches. Below are the steps I followed
   1. Sent a batch of 100 records to kafka. Ran compaction. No.of messages in 
kafka and no.of records in Athena, matched.
   2. Sent a batch of another 100 records to Kafka -> Compaction -> no.of msgs 
in kafka = no.of records in Athena.
   3. Sent a batch of another 100 records (here there were some duplicates ) -> 
Compaction -> no.of.msgs in Kafka = no. of records in Athena.
   4. Sent another batch 98 records (some were duplicates) -> compaction -> 
no.of messages != no.of records in Athena.  There were no more files to be 
compacted. About 24 records were missing.
   5.  Sent another 100 records. -> compaction -> record count did not match. 
there was same 24 missing. 
   
   More or less, I followed the above steps several times before I raised the 
issue here. Each time, after few runs the record count does not match even 
after running compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to