Re: Question about Snappy compression format.

Ryan Blue Fri, 05 Mar 2021 08:55:01 -0800

Do we support any table options passed through here? I thought we had
separate options defined that use shorter names (like target-size).


On Fri, Mar 5, 2021 at 8:50 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I think if we are going to have our write behavior work like that we
> should probably switch to a whitelisting of valid properties for Spark
> writes, so we can warn folks that some options won't actually do anything.
> I think the current behavior is a bit of a surprise, I also don't like
> silent options :)
>
> On Mar 5, 2021, at 10:47 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>
> Russell is right. The property you're trying to set is a table property
> and needs to be set on the table.
>
> We don't currently support overriding arbitrary table properties in write
> options, mainly because we want to encourage people to set their
> configuration on the table instead of in jobs. That's a best practice that
> I highly recommend so you don't need to configure every job that writes to
> the table, and so you can make changes and have them automatically take
> effect without recompiling your write job.
>
> On Fri, Mar 5, 2021 at 8:44 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> I believe those are currently only respected as table properties and not
>> as "spark write" properties although there is a case to be made that we
>> should accept them there as well. You can alter your table so that it
>> contains those properties and new files will be created with the
>> compression you would like.
>>
>> On Mar 5, 2021, at 7:15 AM, Javier Sanchez Beltran <
>> jabelt...@expediagroup.com.INVALID> wrote:
>>
>> Hello Iceberg team!
>>
>> I have been researching Apache Iceberg to see how would work in our
>> environment. We are still trying out things. We would like to have Parquet
>> format with SNAPPY compression type.
>>
>> I already try changing these two properties to SNAPPY, but it didn’t work
>> (https://iceberg.apache.org/configuration/):
>>
>>
>> write.avro.compression-codec
>>
>> Gzip -> SNAPPY
>>
>> write.parquet.compression-codec
>>
>> Gzip -> SNAPPY
>> In this way:
>>
>> dataset
>>           .writeStream()
>>           .format("iceberg")
>>           .outputMode("append")
>>           .option("write.parquet.compression-codec", "SNAPPY")
>>           .option("write.avro.compression-codec", "SNAPPY")
>>           …start()
>>
>>
>> Did I do something in a bad way? Or maybe we need to take care of the
>> implementation of this SNAPPY compression?
>>
>> Thank you in advance,
>> Javier.
>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Question about Snappy compression format.

Reply via email to