Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-24 Thread Sungwoo Park
I tried 10TB TPC-DS benchmark with Iceberg, but from preliminary results, the execution time increases about 20% (total execution time from 4900s to 5800s, geo-mean from 19s to 21s). However, please note that the result is not conclusive because 1) I used the build from last November, instead of th

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-14 Thread Shohei Okumiya
Hi, I'm thrilled to see various opinions in this thread! I respect Ayush for initiating the discussion with the brave proposal and am proud of all the community members here. I am also aware of one interesting point of this thread: we believe in the potential coverage of Apache Hive. Although the

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-12 Thread Stamatis Zampetakis
Iceberg gets a lot of traction and the integration with Hive becomes more and more mature so it makes sense to start the discussion about making it as the default choice. However, I feel that it may be a bit too soon to do the switch right now. Apart from performance numbers our Iceberg test cover

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-12 Thread Denys Kuzmenko
Thanks Sungwoo, Regarding performance testing, am I correct to assume that the "original" Hive table is an external one? Since Iceberg supports deletes, it might be worth comparing it against Hive ACID. We could generate 10-20% of the updates and measure the read performance overhead. Additi

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-12 Thread Sungwoo Park
Hi, In my opinion, another major issue to address before switching to Iceberg as the default is Iceberg catalog support, e.g.: HIVE-28658: Iceberg REST Catalog Support HIVE-28879: Federated Catalog support My guess is that potential new users would be quite surprised to find no support for the I

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-09 Thread Denys Kuzmenko
Hi, I'm a bit hesitant switching to Iceberg as the default atm. I lean more toward setting the default table format at the database level instead. Hive Iceberg currently lacks automatic table maintenance, comprehensive support for partition-level statistics, and various partition-aware optimiza

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-08 Thread Butao Zhang
I'm definitely +1 on this proposal. Apache Hive has deeply integrated the Iceberg table format for several years now. Switching the default table format to Iceberg would send a strong and positive signal to the community: that Iceberg has become a first-class citizen in the Hive engine, and it'

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-07 Thread Attila Turoczy
Hi, I strongly support this proposal. Hive would be one of the first engines globally to make a clear and public commitment to Apache Iceberg, which is a significant and forward-looking step. From my perspective, the majority of recent development efforts have been focused on Iceberg, and it makes

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

2025-04-07 Thread Shohei Okumiya
Hi Ayush, Thanks for initiating the interesting discussion. In my personal opinion, it is likely a good idea. Apache Iceberg is competitive and open. I can't immediately mention significant drawbacks when users use Iceberg tables instead. As a community member, I'm interested in data and facts t