Thanks Sungwoo,

Regarding performance testing, am I correct to assume that the "original" Hive 
table is an external one? 

Since Iceberg supports deletes, it might be worth comparing it against Hive 
ACID. We could generate 10-20% of the updates and measure the read performance 
overhead.

Additionally, there's a 1 Trillion Row Challenge [1], [2] that we could try, 
extending it with the delete operations (see Impala talk on Iceberg Summit 
2025).

In any case, it would be helpful to create a roadmap or a Jira EPIC for the 
Default Table Format migration and populate it with the key tasks we think are 
essential before making the switch.

1. https://www.morling.dev/blog/one-billion-row-challenge/ 
2. 
https://medium.com/dbsql-sme-engineering/1-trillion-row-challenge-on-databricks-sql-41a82fac5bed

Regards,
Denys

Reply via email to