I have a partitioned hive external table as below
scala> spark.sql("describe extended db1.table1").show(100,false)
+----------------------------+--------------------------------------------------------------+
|col_name |data_type
|
+----------------------------+--------------------------------------------------------------+
|name |string
|
|event_partition |string
|
|# Partition Information |
|
|# col_name |data_type
|
|event_partition |string
|
| |
|
|# Detailed Table Information|
|
|Catalog |spark_catalog
|
|Database |db1
|
|Table |table1
|
|Owner |root
|
|Created Time |Tue Apr 15 15:30:00 UTC 2025
|
|Last Access |UNKNOWN
|
|Created By |Spark 3.5.3
|
|Type |EXTERNAL
|
|Provider |hive
|
|Table Properties |[transient_lastDdlTime=1746110529]
|
|Location |gs://my-bucket/db1/table1
|
|Serde Library
|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
|InputFormat
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |
|OutputFormat
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat|
|Storage Properties |[serialization.format=1]
|
|Partition Provider |Catalog
|
+----------------------------+--------------------------------------------------------------+
Below is my existing partition created via spark sql
scala> val catalogPartitions =
spark.sharedState.externalCatalog.listPartitions("db1", "table1")
scala> val partitionValues = catalogPartitions.foreach(cp => {
| val partitionSpec = cp.spec
| println(partitionSpec + " Parameters:" + cp.parameters + "
lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime)
| })
Map(event_partition -> 2024-01-04)
Parameters:Map(**transient_lastDdlTime -> 1744731019**, totalSize ->
475, numFiles -> 1) lastAccessTime:0 createTime:1744731019000
partitionValues: Unit = ()
when I insert new partition, transient_lastddltime is updated in hive
metastore
scala> spark.sql("insert overwrite table db1.table1
partition(event_partition) select 'A','2024-01-05'").show()
++
||
++
++
scala> val catalogPartitions =
spark.sharedState.externalCatalog.listPartitions("db1", "table1")
scala> val partitionValues = catalogPartitions.foreach(cp => {
| val partitionSpec = cp.spec
| println(partitionSpec + " Parameters:" + cp.parameters + "
lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime)
| })
Map(event_partition -> 2024-01-04)
Parameters:Map(transient_lastDdlTime -> 1744731019, totalSize -> 475,
numFiles -> 1) lastAccessTime:0 createTime:1744731019000
Map(event_partition -> 2024-01-05)
Parameters:Map(**transient_lastDdlTime -> 1746112922**, totalSize ->
455, numFiles -> 1) lastAccessTime:0 createTime:1746112922000
partitionValues: Unit = ()
when I insert overwrite the same partition, transient_lastddltime and
column_stats are not getting updated. This used to work in spark 2.4
scala> spark.sql("insert overwrite table db1.table1
partition(event_partition) select 'B','2024-01-05'").show()
++
||
++
++
scala> val catalogPartitions =
spark.sharedState.externalCatalog.listPartitions("db1", "table1")
scala> val partitionValues = catalogPartitions.foreach(cp => {
| val partitionSpec = cp.spec
| println(partitionSpec + " Parameters:" + cp.parameters + "
lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime)
| })
Map(event_partition -> 2024-01-04)
Parameters:Map(transient_lastDdlTime -\> 1744731019, totalSize -> 475,
numFiles -> 1) lastAccessTime:0 createTime:1744731019000
Map(event_partition -> 2024-01-05)
Parameters:Map(**transient_lastDdlTime -> 1746112922**, totalSize ->
455, numFiles -> 1) lastAccessTime:0 createTime:1746112922000
partitionValues: Unit = ()
I'm performing the same insert overwrite in hive for the same partition and
this is updating the transient_lastddltime and column stats,
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive>
insert overwrite table db1.table1 partition(event_partition)
select 'B','2024-01-05' union all
select 'C','2024-01-06';
Query ID = pradeep_20250501152449_33c17f33-8084-4c55-a49b-6d0fe99b17e5
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id
application_1746108119609_0009)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING
PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0
0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0
0 0 0
Map 4 .......... container SUCCEEDED 1 1 0
0 0 0
-
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 18.37 s
-
Loading data to table db1.table1 partition (event_partition=null)
Loaded : 2/2 partitions.
Time taken to load dynamic partitions: 1.763 seconds
Time taken for adding to write entity : 0.002 seconds
OK
Time taken: 90.182 seconds
scala> val catalogPartitions =
spark.sharedState.externalCatalog.listPartitions("db1", "table1")
scala> val partitionValues = catalogPartitions.foreach(cp => {
| val partitionSpec = cp.spec
| println(partitionSpec + " Parameters:" + cp.parameters + "
lastAccessTime:" + cp.lastAccessTime + " createTime:" + cp.createTime)
| })
Map(event_partition -> 2024-01-04)
Parameters:Map(transient_lastDdlTime -> 1744731019, totalSize -> 475,
numFiles -> 1) lastAccessTime:0 createTime:1744731019000
Map(event_partition -> 2024-01-05) Parameters:Map(rawDataSize -> 1,
numFiles -> 1, *transient_lastDdlTime -> 1746113178*, totalSize ->
316, **COLUMN_STATS_ACCURATE ->
{"BASIC_STATS":"true","COLUMN_STATS":{"name":"true"}}, numRows -> 1**)
lastAccessTime:0 createTime:1746112922000
Map(event_partition -> 2024-01-06) Parameters:Map(rawDataSize -> 1,
numFiles -> 1, transient_lastDdlTime -> 1746113178, totalSize -> 316,
COLUMN_STATS_ACCURATE ->
{"BASIC_STATS":"true","COLUMN_STATS":{"name":"true"}}, numRows -> 1)
lastAccessTime:0 createTime:0
partitionValues: Unit = ()
Below is what I implemented in spark2.4 and used to work and after
upgrading to spark3.x, this functionality is broken. Get all the new
partitions that are written to Hive metastore by Spark
<https://stackoverflow.com/questions/57202917/get-all-the-new-partitions-that-are-written-to-hive-metastore-by-spark>
Regards,
Pradeep