Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread Péter Váry
I was playing around Flink ingestion performance testing and I have found that the compression codec is also an important factor. Using zstd has much higher write performance, using gzip has higher compression rate. So I would argue that there are more factors which could be optimized for writing

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread Steven Wu
agree with Ryan. Engines usually provide override capability that allows users to choose a different write format (than table default) if needed. There are many production use cases that write columnar formats (like Parquet) in streaming ingestion. I don't necessarily agree that it will be common

Re: [Discuss] Different file formats for ingestion and compaction

2024-10-25 Thread rdb...@gmail.com
Gabor, The reason why the write format is a "default" is that I intended for it to be something that engines could override. For cases where it doesn't make sense to use the default because of memory pressure (as you might see in ingestion processes) you could choose to override and use a format t