I tried 10TB TPC-DS benchmark with Iceberg, but from preliminary results,
the execution time increases about 20% (total execution time from 4900s to
5800s, geo-mean from 19s to 21s). However, please note that the result is
not conclusive because 1) I used the build from last November, instead of
th
Hi,
I'm thrilled to see various opinions in this thread! I respect Ayush
for initiating the discussion with the brave proposal and am proud of
all the community members here.
I am also aware of one interesting point of this thread: we believe in
the potential coverage of Apache Hive. Although the
Iceberg gets a lot of traction and the integration with Hive becomes more
and more mature so it makes sense to start the discussion about making it
as the default choice.
However, I feel that it may be a bit too soon to do the switch right now.
Apart from performance numbers our Iceberg test cover
Thanks Sungwoo,
Regarding performance testing, am I correct to assume that the "original" Hive
table is an external one?
Since Iceberg supports deletes, it might be worth comparing it against Hive
ACID. We could generate 10-20% of the updates and measure the read performance
overhead.
Additi
Hi,
In my opinion, another major issue to address before switching to Iceberg
as the default is Iceberg catalog support, e.g.:
HIVE-28658: Iceberg REST Catalog Support
HIVE-28879: Federated Catalog support
My guess is that potential new users would be quite surprised to find no
support for the I
Hi,
I'm a bit hesitant switching to Iceberg as the default atm. I lean more toward
setting the default table format at the database level instead.
Hive Iceberg currently lacks automatic table maintenance, comprehensive support
for partition-level statistics, and various partition-aware optimiza
I'm definitely +1 on this proposal.
Apache Hive has deeply integrated the Iceberg table format for several years
now. Switching the default table format to Iceberg would send a strong and
positive signal to the community: that Iceberg has become a first-class citizen
in the Hive engine, and it'
Hi,
I strongly support this proposal. Hive would be one of the first engines
globally to make a clear and public commitment to Apache Iceberg, which is
a significant and forward-looking step. From my perspective, the majority
of recent development efforts have been focused on Iceberg, and it makes
Hi Ayush,
Thanks for initiating the interesting discussion.
In my personal opinion, it is likely a good idea. Apache Iceberg is
competitive and open. I can't immediately mention significant
drawbacks when users use Iceberg tables instead.
As a community member, I'm interested in data and facts t