Iceberg - PySpark overwrite with a condition

Ha Cao Thu, 27 Jun 2024 12:30:44 -0700

Hello,

I am experimenting with PySpark's DataFrameWriterV2 
overwrite()<https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriterV2.overwrite.html>
 to an Iceberg table with existing data in a target partition. My goal is that 
instead of overwriting the entire partition, it will only overwrite specific 
rows that match the condition. However, I can't get it to work with any syntax 
and I keep getting "Column is not iterable". I have tried:


df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid)
df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid.isin(1))
df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid >= 1)

and all of these syntaxes fail with "Column is not iterable".

What is the correct syntax for this? I also think that there is a possibility 
that Iceberg-PySpark integration doesn't support overwrite, but I don't know 
how to confirm this.

Thank you so much!
Best,
Ha

Iceberg - PySpark overwrite with a condition

Reply via email to