Re: Question about Snappy compression format.

Russell Spitzer Fri, 05 Mar 2021 08:56:27 -0800

I think they all have different names and that's what I would be
whitelisting, so any table options or a-like would be rejected as invalid
options.


On Fri, Mar 5, 2021 at 10:54 AM Ryan Blue <rb...@netflix.com> wrote:

> Do we support any table options passed through here? I thought we had
> separate options defined that use shorter names (like target-size).
>
> On Fri, Mar 5, 2021 at 8:50 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> I think if we are going to have our write behavior work like that we
>> should probably switch to a whitelisting of valid properties for Spark
>> writes, so we can warn folks that some options won't actually do anything.
>> I think the current behavior is a bit of a surprise, I also don't like
>> silent options :)
>>
>> On Mar 5, 2021, at 10:47 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>>
>> Russell is right. The property you're trying to set is a table property
>> and needs to be set on the table.
>>
>> We don't currently support overriding arbitrary table properties in write
>> options, mainly because we want to encourage people to set their
>> configuration on the table instead of in jobs. That's a best practice that
>> I highly recommend so you don't need to configure every job that writes to
>> the table, and so you can make changes and have them automatically take
>> effect without recompiling your write job.
>>
>> On Fri, Mar 5, 2021 at 8:44 AM Russell Spitzer <russell.spit...@gmail.com>
>> wrote:
>>
>>> I believe those are currently only respected as table properties and not
>>> as "spark write" properties although there is a case to be made that we
>>> should accept them there as well. You can alter your table so that it
>>> contains those properties and new files will be created with the
>>> compression you would like.
>>>
>>> On Mar 5, 2021, at 7:15 AM, Javier Sanchez Beltran <
>>> jabelt...@expediagroup.com.INVALID> wrote:
>>>
>>> Hello Iceberg team!
>>>
>>> I have been researching Apache Iceberg to see how would work in our
>>> environment. We are still trying out things. We would like to have Parquet
>>> format with SNAPPY compression type.
>>>
>>> I already try changing these two properties to SNAPPY, but it didn’t
>>> work (https://iceberg.apache.org/configuration/):
>>>
>>>
>>> write.avro.compression-codec
>>>
>>> Gzip -> SNAPPY
>>>
>>> write.parquet.compression-codec
>>>
>>> Gzip -> SNAPPY
>>> In this way:
>>>
>>> dataset
>>>           .writeStream()
>>>           .format("iceberg")
>>>           .outputMode("append")
>>>           .option("write.parquet.compression-codec", "SNAPPY")
>>>           .option("write.avro.compression-codec", "SNAPPY")
>>>           …start()
>>>
>>>
>>> Did I do something in a bad way? Or maybe we need to take care of the
>>> implementation of this SNAPPY compression?
>>>
>>> Thank you in advance,
>>> Javier.
>>>
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Question about Snappy compression format.

Reply via email to