Re: Iceberg - PySpark overwrite with a condition

2024-06-30 Thread Fokko Driesprong
temp_args.append(temp_arg) > >1266 new_args.append(temp_arg) > > > > ...py4j/java_collections.py in convert(self, object, gateway_client) > > 508 ArrayList = JavaClass("java.util.ArrayList", > gateway_client) >

RE: Iceberg - PySpark overwrite with a condition

2024-06-28 Thread Ha Cao
62 def __iter__(self): --> 463 raise TypeError("Column is not iterable") 464 465 # string methods TypeError: Column is not iterable Thanks! Best, Ha From: Fokko Driesprong Sent: Friday, June 28, 2024 3:00 PM To: dev@iceberg.apache.org Subject: Re: Iceberg - PySpar

Re: Iceberg - PySpark overwrite with a condition

2024-06-28 Thread Fokko Driesprong
h).using("iceberg").overwrite(col("time").less(target_timestamp)) > > > > The only example I can find in the PySpark codebase is > https://github.com/apache/spark/blob/master/python/pyspark/sql/tests/test_readwriter.py#L251 > but even with this, it throws `Column is no

RE: Iceberg - PySpark overwrite with a condition

2024-06-28 Thread Ha Cao
n/pyspark/sql/tests/test_readwriter.py#L251 but even with this, it throws `Column is not iterable`. I cannot find any other test case that tests `overwrite()` as a method. Thank you! Best, Ha From: Ajantha Bhat Sent: Friday, June 28, 2024 3:52 AM To: dev@iceberg.apache.org Subject: Re: Icebe

Re: Iceberg - PySpark overwrite with a condition

2024-06-28 Thread Ajantha Bhat
Hi, Please refer this doc: https://iceberg.apache.org/docs/nightly/spark-writes/#overwriting-data We do have some test cases for the same: https://github.com/apache/iceberg/blob/91fbcaa62c25308aa815557dd2c0041f75530705/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesT

Iceberg - PySpark overwrite with a condition

2024-06-27 Thread Ha Cao
Hello, I am experimenting with PySpark's DataFrameWriterV2 overwrite() to an Iceberg table with existing data in a target partition. My goal is that instead of overwriting the