Hi, Please refer this doc: https://iceberg.apache.org/docs/nightly/spark-writes/#overwriting-data
We do have some test cases for the same: https://github.com/apache/iceberg/blob/91fbcaa62c25308aa815557dd2c0041f75530705/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java#L153 - Ajantha On Fri, Jun 28, 2024 at 1:00 AM Ha Cao <ha....@twosigma.com> wrote: > Hello, > > > > I am experimenting with PySpark’s DataFrameWriterV2 overwrite() > <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriterV2.overwrite.html> > to an Iceberg table with existing data in a target partition. My goal is > that instead of overwriting the entire partition, it will only overwrite > specific rows that match the condition. However, I can’t get it to work > with any syntax and I keep getting “Column is not iterable”. I have tried: > > > > df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid) > > df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid.isin(1)) > > df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid >= 1) > > > > and all of these syntaxes fail with “Column is not iterable”. > > > > What is the correct syntax for this? I also think that there is a > possibility that Iceberg-PySpark integration doesn’t support overwrite, but > I don’t know how to confirm this. > > > > Thank you so much! > > Best, > Ha >