I think it is not happening because it is a ddl time and upsert operation does not recreate the partition. It is just a dml statement.
Sent from Yahoo Mail for iPhone On Friday, May 2, 2025, 7:53 AM, Pradeep <m15prad...@gmail.com> wrote: I have a partitioned hive external table as belowscala> spark.sql("describe extended db1.table1").show(100,false) +----------------------------+--------------------------------------------------------------+ |col_name |data_type | +----------------------------+--------------------------------------------------------------+ |name |string | |event_partition |string | |# Partition Information | | |# col_name |data_type | |event_partition |string | | | | |# Detailed Table Information| | |Catalog |spark_catalog | |Database |db1 | |Table |table1 | |Owner |root | |Created Time |Tue Apr 15 15:30:00 UTC 2025 | |Last Access |UNKNOWN | |Created By |Spark 3.5.3 | |Type |EXTERNAL | |Provider |hive | |Table Properties |[transient_lastDdlTime=1746110529] | |Location |gs://my-bucket/db1/table1 | |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |Storage Properties |[serialization.format=1] | |Partition Provider |Catalog | +----------------------------+--------------------------------------------------------------+ Below is my existing partition created via spark sqlscala> val catalogPartitions = spark.sharedState.externalCatalog.listPartitions("db1", "table1") scala> val partitionValues = catalogPartitions.foreach(cp => { | val partitionSpec = cp.spec | println(partitionSpec + " Parameters:" + cp.parameters + " lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime) | }) Map(event_partition -> 2024-01-04) Parameters:Map(**transient_lastDdlTime -> 1744731019**, totalSize -> 475, numFiles -> 1) lastAccessTime:0 createTime:1744731019000 partitionValues: Unit = ()when I insert new partition, transient_lastddltime is updated in hive metastorescala> spark.sql("insert overwrite table db1.table1 partition(event_partition) select 'A','2024-01-05'").show() ++ || ++ ++scala> val catalogPartitions = spark.sharedState.externalCatalog.listPartitions("db1", "table1") scala> val partitionValues = catalogPartitions.foreach(cp => { | val partitionSpec = cp.spec | println(partitionSpec + " Parameters:" + cp.parameters + " lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime) | }) Map(event_partition -> 2024-01-04) Parameters:Map(transient_lastDdlTime -> 1744731019, totalSize -> 475, numFiles -> 1) lastAccessTime:0 createTime:1744731019000 Map(event_partition -> 2024-01-05) Parameters:Map(**transient_lastDdlTime -> 1746112922**, totalSize -> 455, numFiles -> 1) lastAccessTime:0 createTime:1746112922000 partitionValues: Unit = () when I insert overwrite the same partition, transient_lastddltime and column_stats are not getting updated. This used to work in spark 2.4scala> spark.sql("insert overwrite table db1.table1 partition(event_partition) select 'B','2024-01-05'").show() ++ || ++ ++scala> val catalogPartitions = spark.sharedState.externalCatalog.listPartitions("db1", "table1") scala> val partitionValues = catalogPartitions.foreach(cp => { | val partitionSpec = cp.spec | println(partitionSpec + " Parameters:" + cp.parameters + " lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime) | }) Map(event_partition -> 2024-01-04) Parameters:Map(transient_lastDdlTime -\> 1744731019, totalSize -> 475, numFiles -> 1) lastAccessTime:0 createTime:1744731019000 Map(event_partition -> 2024-01-05) Parameters:Map(**transient_lastDdlTime -> 1746112922**, totalSize -> 455, numFiles -> 1) lastAccessTime:0 createTime:1746112922000 partitionValues: Unit = () I'm performing the same insert overwrite in hive for the same partition and this is updating the transient_lastddltime and column stats,hive> set hive.exec.dynamic.partition.mode=nonstrict; hive> insert overwrite table db1.table1 partition(event_partition) select 'B','2024-01-05' union all select 'C','2024-01-06'; Query ID = pradeep_20250501152449_33c17f33-8084-4c55-a49b-6d0fe99b17e5 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1746108119609_0009) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0 Map 4 .......... container SUCCEEDED 1 1 0 0 0 0 - VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 18.37 s - Loading data to table db1.table1 partition (event_partition=null) Loaded : 2/2 partitions. Time taken to load dynamic partitions: 1.763 seconds Time taken for adding to write entity : 0.002 seconds OK Time taken: 90.182 secondsscala> val catalogPartitions = spark.sharedState.externalCatalog.listPartitions("db1", "table1") scala> val partitionValues = catalogPartitions.foreach(cp => { | val partitionSpec = cp.spec | println(partitionSpec + " Parameters:" + cp.parameters + " lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime) | }) Map(event_partition -> 2024-01-04) Parameters:Map(transient_lastDdlTime -> 1744731019, totalSize -> 475, numFiles -> 1) lastAccessTime:0 createTime:1744731019000 Map(event_partition -> 2024-01-05) Parameters:Map(rawDataSize -> 1, numFiles -> 1, transient_lastDdlTime -> 1746113178, totalSize -> 316, **COLUMN_STATS_ACCURATE -> {"BASIC_STATS":"true","COLUMN_STATS":{"name":"true"}}, numRows -> 1**) lastAccessTime:0 createTime:1746112922000 Map(event_partition -> 2024-01-06) Parameters:Map(rawDataSize -> 1, numFiles -> 1, transient_lastDdlTime -> 1746113178, totalSize -> 316, COLUMN_STATS_ACCURATE -> {"BASIC_STATS":"true","COLUMN_STATS":{"name":"true"}}, numRows -> 1) lastAccessTime:0 createTime:0 partitionValues: Unit = ()Below is what I implemented in spark2.4 and used to work and after upgrading to spark3.x, this functionality is broken. Get all the new partitions that are written to Hive metastore by Spark Regards,Pradeep