yanenze commented on issue #4855: URL: https://github.com/apache/hudi/issues/4855#issuecomment-1048506073
> Yes, the file size control is not accurate, how much size does the actual file size exceed your desired threshold ? hello, i configured the max parquet file size 128M but finally it grow to 8G, so i try to find the reason. i have create a pull request in #4879 i think the big file generate reason is when bucketAssigner find small file list , is lost the file which is in pendingCompaction, so the total size only caculate the (log file size * compressratio (0.35)) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
