Overwrite Mode not Working Correctly in spark 3.0.0

anbutech Sun, 19 Jul 2020 10:26:02 -0700

Hi Team,

I'm facing weird behavior in the pyspark dataframe(databricks delta spark
3.0.0 supported)


I have tried the below two options to write the processed datafame data into
delta table with respect to the partition columns in the table.Actually
overwrite mode completely overwrite the whole table.i couldn't figure it out
why did the dataframe fully overwrite here.

Also i'm getting the following error while testing with below option 2


Predicate references non-partition column 'json_feeds_flatten_data'. Only
the partition columns may be referenced: [table_name, y, m, d, h];

could you please me why did the pyspark behavior like this?.It would be very
helpful to know the mistake here.

sample partition column values:
-------------------------------

table_name='json_feeds_flatten_data'
y=2020
m=7
d=19
h=0

Option 1:

partition_keys=['table_name','y','m','d','h']

         (final_df
          .withColumn('y', lit(y).cast('int'))
           .withColumn('m', lit(m).cast('int'))
           .withColumn('d', lit(d).cast('int'))
           .withColumn('h', lit(h).cast('int'))
           .write
           .partitionBy(partition_keys)
           .format("delta")
           .mode('overwrite')
           .saveAsTable(target_table)
         )

Option 2:

rep_wh = 'table_name={} AND y={} AND m={} AND d={} AND
h={}'.format(table_name,y, m, d, h) 
        (final_df
          .withColumn('y', lit(y).cast('int'))
          .withColumn('m', lit(m).cast('int'))
          .withColumn('d', lit(d).cast('int'))
          .withColumn('h', lit(h).cast('int'))
          .write
          .format("delta")
          .mode('overwrite')
          .option('replaceWhere', rep_wh )
          .saveAsTable(target_table)
        )

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Overwrite Mode not Working Correctly in spark 3.0.0

Reply via email to