Re: Question about Snappy compression format.

Russell Spitzer Fri, 05 Mar 2021 08:51:05 -0800

I think if we are going to have our write behavior work like that we should 
probably switch to a whitelisting of valid properties for Spark writes, so we 
can warn folks that some options won't actually do anything. I think the 
current behavior is a bit of a surprise, I also don't like silent options :)


> On Mar 5, 2021, at 10:47 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> Russell is right. The property you're trying to set is a table property and 
> needs to be set on the table.
> 
> We don't currently support overriding arbitrary table properties in write 
> options, mainly because we want to encourage people to set their 
> configuration on the table instead of in jobs. That's a best practice that I 
> highly recommend so you don't need to configure every job that writes to the 
> table, and so you can make changes and have them automatically take effect 
> without recompiling your write job.
> 
> On Fri, Mar 5, 2021 at 8:44 AM Russell Spitzer <russell.spit...@gmail.com 
> <mailto:russell.spit...@gmail.com>> wrote:
> I believe those are currently only respected as table properties and not as 
> "spark write" properties although there is a case to be made that we should 
> accept them there as well. You can alter your table so that it contains those 
> properties and new files will be created with the compression you would like.
> 
>> On Mar 5, 2021, at 7:15 AM, Javier Sanchez Beltran 
>> <jabelt...@expediagroup.com.INVALID 
>> <mailto:jabelt...@expediagroup.com.INVALID>> wrote:
>> 
>> Hello Iceberg team!
>>  
>> I have been researching Apache Iceberg to see how would work in our 
>> environment. We are still trying out things. We would like to have Parquet 
>> format with SNAPPY compression type.
>>  
>> I already try changing these two properties to SNAPPY, but it didn’t work 
>> (https://iceberg.apache.org/configuration/ 
>> <https://iceberg.apache.org/configuration/>):
>> 
>> 
>> write.avro.compression-codec
>> 
>> Gzip -> SNAPPY
>> 
>> write.parquet.compression-codec
>> 
>> Gzip -> SNAPPY
>> 
>> In this way:
>>  
>> dataset
>>           .writeStream()
>>           .format("iceberg")
>>           .outputMode("append")
>>           .option("write.parquet.compression-codec", "SNAPPY")
>>           .option("write.avro.compression-codec", "SNAPPY")
>>           …start()
>>  
>>  
>> Did I do something in a bad way? Or maybe we need to take care of the 
>> implementation of this SNAPPY compression?
>>  
>> Thank you in advance,
>> Javier.
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: Question about Snappy compression format.

Reply via email to