HuangZhenQiu commented on PR #13409: URL: https://github.com/apache/hudi/pull/13409#issuecomment-3095105503
Small files is not good for query performance. But if we have whole parquet file with order, we will lose the data freshness. Sort time will increase a lot then cause the high back pressure in Flink job. Thus, we use the buffer size to control the row group level order and compression ratio. It is a trade off to achieve data freshness and storage size without keeping parquet file level sort. We will leverage table service to do the stitching later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
