Hello. I am currently reading this: https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/committers.html and learning about the s3a committers.
It's a bit confusing and it seems like you need to be an expert in order to properly use these committers. Because you don't just write to an s3a path and use standard spark configs, you also need to provide configs for the s3a committers... I also saw this: https://github.com/rdblue/s3committer and it says that people should just use iceberg. Does that mean that with iceberg you just write to an s3a path and you don't have to specify which committer (partitioned, directory, magic) to use and everything works optimally? Does iceberg have its own committers or something? I know that s3a's staging committers, for example, require big enough local storage and hdfs, while s3a's magic committer doesn't... Which makes me wonder if iceberg has any requirements also... Spark also has this guide: https://spark.apache.org/docs/latest/cloud-integration.html in which it recommends these settings for parquet: spark.hadoop.parquet.enable.summary-metadata false spark.sql.parquet.mergeSchema false spark.sql.parquet.filterPushdown true spark.sql.hive.metastorePartitionPruning true And these settings for orc: spark.sql.orc.filterPushdown true spark.sql.orc.splits.include.file.footer true spark.sql.orc.cache.stripe.details.size 10000 spark.sql.hive.metastorePartitionPruning true Should I specify these settings when using parquet or orc with iceberg in spark? Thank you.