I think if we are going to have our write behavior work like that we should probably switch to a whitelisting of valid properties for Spark writes, so we can warn folks that some options won't actually do anything. I think the current behavior is a bit of a surprise, I also don't like silent options :)
> On Mar 5, 2021, at 10:47 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > Russell is right. The property you're trying to set is a table property and > needs to be set on the table. > > We don't currently support overriding arbitrary table properties in write > options, mainly because we want to encourage people to set their > configuration on the table instead of in jobs. That's a best practice that I > highly recommend so you don't need to configure every job that writes to the > table, and so you can make changes and have them automatically take effect > without recompiling your write job. > > On Fri, Mar 5, 2021 at 8:44 AM Russell Spitzer <russell.spit...@gmail.com > <mailto:russell.spit...@gmail.com>> wrote: > I believe those are currently only respected as table properties and not as > "spark write" properties although there is a case to be made that we should > accept them there as well. You can alter your table so that it contains those > properties and new files will be created with the compression you would like. > >> On Mar 5, 2021, at 7:15 AM, Javier Sanchez Beltran >> <jabelt...@expediagroup.com.INVALID >> <mailto:jabelt...@expediagroup.com.INVALID>> wrote: >> >> Hello Iceberg team! >> >> I have been researching Apache Iceberg to see how would work in our >> environment. We are still trying out things. We would like to have Parquet >> format with SNAPPY compression type. >> >> I already try changing these two properties to SNAPPY, but it didn’t work >> (https://iceberg.apache.org/configuration/ >> <https://iceberg.apache.org/configuration/>): >> >> >> write.avro.compression-codec >> >> Gzip -> SNAPPY >> >> write.parquet.compression-codec >> >> Gzip -> SNAPPY >> >> In this way: >> >> dataset >> .writeStream() >> .format("iceberg") >> .outputMode("append") >> .option("write.parquet.compression-codec", "SNAPPY") >> .option("write.avro.compression-codec", "SNAPPY") >> …start() >> >> >> Did I do something in a bad way? Or maybe we need to take care of the >> implementation of this SNAPPY compression? >> >> Thank you in advance, >> Javier. > > > > -- > Ryan Blue > Software Engineer > Netflix