HuangZhenQiu commented on PR #13409:
URL: https://github.com/apache/hudi/pull/13409#issuecomment-3095105503

   Small files is not good for query performance. But if we have whole parquet 
file with order, we will lose the data freshness. Sort time will increase a lot 
then cause the high back pressure in Flink job. Thus, we use the buffer size to 
control the row group level order and compression ratio. It is a trade off to 
achieve data freshness and storage size without keeping parquet file level 
sort. We will leverage table service to do the stitching later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to