bvaradar commented on issue #2393:
URL: https://github.com/apache/hudi/issues/2393#issuecomment-752338459


   Can you look at the log message "AvgRecordSize => "  and see what value is 
getting printed. My suspicion is your second commit has comparatively very 
large record size (in bytes) when compared to the records in the next big 
batch. Hudi dynamically adapts to changing record size by leveraging record 
size seen in previous commits. 
   
   With 0.6.0, we have made the change to look for previous commits where 
number of records written per file was more than Small file limit (minimum size 
for a file to be not treated as small files).  Is it possible to try 0.6.0 and 
see if you are having the same experience (this can still happen for you in 
0.6.0 if the 3 records that you are writing is big enough to become greater 
than the small file size limit). 
    
   Added Jira to let user override this estimate :  
https://issues.apache.org/jira/browse/HUDI-1499


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to