Re: Iceberg - PySpark overwrite with a condition

Ajantha Bhat Fri, 28 Jun 2024 00:52:32 -0700

Hi,

Please refer this doc:
https://iceberg.apache.org/docs/nightly/spark-writes/#overwriting-data


We do have some test cases for the same:
https://github.com/apache/iceberg/blob/91fbcaa62c25308aa815557dd2c0041f75530705/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java#L153

- Ajantha

On Fri, Jun 28, 2024 at 1:00 AM Ha Cao <ha....@twosigma.com> wrote:

> Hello,
>
>
>
> I am experimenting with PySpark’s DataFrameWriterV2 overwrite()
> <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriterV2.overwrite.html>
> to an Iceberg table with existing data in a target partition. My goal is
> that instead of overwriting the entire partition, it will only overwrite
> specific rows that match the condition. However, I can’t get it to work
> with any syntax and I keep getting “Column is not iterable”. I have tried:
>
>
>
> df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid)
>
> df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid.isin(1))
>
> df.writeTo(spark_table_path).using("iceberg").overwrite(df.tid >= 1)
>
>
>
> and all of these syntaxes fail with “Column is not iterable”.
>
>
>
> What is the correct syntax for this? I also think that there is a
> possibility that Iceberg-PySpark integration doesn’t support overwrite, but
> I don’t know how to confirm this.
>
>
>
> Thank you so much!
>
> Best,
> Ha
>

Re: Iceberg - PySpark overwrite with a condition

Reply via email to