I tried 10TB TPC-DS benchmark with Iceberg, but from preliminary results, the execution time increases about 20% (total execution time from 4900s to 5800s, geo-mean from 19s to 21s). However, please note that the result is not conclusive because 1) I used the build from last November, instead of the latest build, 2) I had some problem with executing Hive-Tez with Iceberg, so I used MR3 in the experiment.
When we discuss the release of the next version of Hive, let me repeat the experiment by loading a fresh 10TB dataset. --- Sungwoo On Sat, Apr 12, 2025 at 6:28 PM Denys Kuzmenko <dkuzme...@apache.org> wrote: > Thanks Sungwoo, > > Regarding performance testing, am I correct to assume that the "original" > Hive table is an external one? > > Since Iceberg supports deletes, it might be worth comparing it against > Hive ACID. We could generate 10-20% of the updates and measure the read > performance overhead. > > Additionally, there's a 1 Trillion Row Challenge [1], [2] that we could > try, extending it with the delete operations (see Impala talk on Iceberg > Summit 2025). > > In any case, it would be helpful to create a roadmap or a Jira EPIC for > the Default Table Format migration and populate it with the key tasks we > think are essential before making the switch. > > 1. https://www.morling.dev/blog/one-billion-row-challenge/ > 2. > https://medium.com/dbsql-sme-engineering/1-trillion-row-challenge-on-databricks-sql-41a82fac5bed > > Regards, > Denys >