Slick, thanks @Ryan Blue <b...@tabular.io>. We will add LZ4 to our mix and report back if we find anything different.
On Thu, Jul 1, 2021 at 1:50 PM Ryan Blue <b...@tabular.io> wrote: > The default should probably be LZ4. In our testing, LZ4 beat snappy for > every dataset for read time, write time, and compression ratio. I believe > it also typically got a better compression ratio than gzip. Gzip was the > previous default because it does a better job on compression ratio than > snappy. > > Ryan > > On Thu, Jul 1, 2021 at 1:48 PM Sreeram Garlapati <gsreeramku...@gmail.com> > wrote: > >> Hello Iceberg devs! >> >> Do any of you folks use the underlying file format as* Parquet + Snappy.* >> >> Iceberg configures this by default as Parquet + gzip ( >> *write.parquet.compression-codec*). >> *Is there any specific reason for this Choice?* >> >> In our preliminary tests we found better numbers with *Parquet + Snappy* >> than with *gzip*. >> Operation = compress and write to local disk >> File Size = 524.3MB (about the same with both the compression codecs) >> row group size = 64mb. >> >> gzip snappy >> 8.304 >> 5.478 >> >> >> We are still in the process of our full benchmarking (for reads) - but, >> want to understand - if there is a whole different angle to this that we >> are not thinking thru. >> >> Truly appreciate any inputs, >> Sreeram >> > > > -- > Ryan Blue > Tabular >