psendyk commented on issue #8890:
URL: https://github.com/apache/hudi/issues/8890#issuecomment-1654659545

   @ad1happy2go Our initial upgrade attempt only failed for one out of four of 
our tables; the other three have much lower incoming data volume so perhaps 
it's related to that. I just trie reproducing the error on another fresh table 
with less data -- I ingested a single micro-batch (which also created the 
table) using 0.12.1, and then continued the ingestion with 0.13.0. This time 
the 0.13.0 job continued to make progress for a couple of micro-batches until I 
killed it; it didn't run into that issue.
   Also, the exception only happens for some partitions in the micro-batch 
while others are written successfully. Perhaps it can be related to partition 
cardinality/file size distribution across partitions; each of the micro-batches 
in our job writes to ~12-15k partitions and the number of records per partition 
varies quite significantly, probably from a few records min to ~10,000s max. I 
haven't verified this but given that the issue seems to be "missing small 
files," I suspect this error might only happen to the partitions with less 
data/more small files. Perhaps you can attempt to reproduce it by modifying the 
partitioning schema in your snippet -- not sure how much data you're ingesting 
but perhaps file sizing is more uniform when only partitioning on `year`. Let 
me know if I can provide any other info that'd help with reproducing. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to