Slick, thanks @Ryan Blue <b...@tabular.io>. We will add LZ4 to our mix and
report back if we find anything different.

On Thu, Jul 1, 2021 at 1:50 PM Ryan Blue <b...@tabular.io> wrote:

> The default should probably be LZ4. In our testing, LZ4 beat snappy for
> every dataset for read time, write time, and compression ratio. I believe
> it also typically got a better compression ratio than gzip. Gzip was the
> previous default because it does a better job on compression ratio than
> snappy.
>
> Ryan
>
> On Thu, Jul 1, 2021 at 1:48 PM Sreeram Garlapati <gsreeramku...@gmail.com>
> wrote:
>
>> Hello Iceberg devs!
>>
>> Do any of you folks use the underlying file format as* Parquet + Snappy.*
>>
>> Iceberg configures this by default as Parquet + gzip (
>> *write.parquet.compression-codec*).
>> *Is there any specific reason for this Choice?*
>>
>> In our preliminary tests we found better numbers with *Parquet + Snappy*
>> than with *gzip*.
>> Operation = compress and write to local disk
>> File Size = 524.3MB (about the same with both the compression codecs)
>> row group size = 64mb.
>>
>> gzip snappy
>> 8.304
>> 5.478
>>
>>
>> We are still in the process of our full benchmarking (for reads) - but,
>> want to understand - if there is a whole different angle to this that we
>> are not thinking thru.
>>
>> Truly appreciate any inputs,
>> Sreeram
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to