Hi Team, I'm facing weird behavior in the pyspark dataframe(databricks delta spark 3.0.0 supported)
I have tried the below two options to write the processed datafame data into delta table with respect to the partition columns in the table.Actually overwrite mode completely overwrite the whole table.i couldn't figure it out why did the dataframe fully overwrite here. Also i'm getting the following error while testing with below option 2 Predicate references non-partition column 'json_feeds_flatten_data'. Only the partition columns may be referenced: [table_name, y, m, d, h]; could you please me why did the pyspark behavior like this?.It would be very helpful to know the mistake here. sample partition column values: ------------------------------- table_name='json_feeds_flatten_data' y=2020 m=7 d=19 h=0 Option 1: partition_keys=['table_name','y','m','d','h'] (final_df .withColumn('y', lit(y).cast('int')) .withColumn('m', lit(m).cast('int')) .withColumn('d', lit(d).cast('int')) .withColumn('h', lit(h).cast('int')) .write .partitionBy(partition_keys) .format("delta") .mode('overwrite') .saveAsTable(target_table) ) Option 2: rep_wh = 'table_name={} AND y={} AND m={} AND d={} AND h={}'.format(table_name,y, m, d, h) (final_df .withColumn('y', lit(y).cast('int')) .withColumn('m', lit(m).cast('int')) .withColumn('d', lit(d).cast('int')) .withColumn('h', lit(h).cast('int')) .write .format("delta") .mode('overwrite') .option('replaceWhere', rep_wh ) .saveAsTable(target_table) ) Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org