Hi,

In my opinion, another major issue to address before switching to Iceberg
as the default is Iceberg catalog support, e.g.:

HIVE-28658: Iceberg REST Catalog Support
HIVE-28879: Federated Catalog support

My guess is that potential new users would be quite surprised to find no
support for the Iceberg REST catalog, especially after Hive switched to
Iceberg as the default table format.

The work in HIVE-27473 is part of this effort, as it is a preliminary step
to implementing either HIVE-28658 or HIVE-28879.

HIVE-27473: Make SessionHiveMetaStoreClient and
HiveMetaStoreClientWithLocalCache composable

HIVE-27473 is optional, as we can implement HIVE-28658/HIVE-28879 without
it. Therefore, a decision should be made on whether or not to adopt the
proposed plan in HIVE-27473. (While there are pros and cons, we believe
the benefits outweigh the drawbacks.)

As for the performance of Hive with Iceberg, we did a small experiment last
August using 1TB TPC-DS. Although I think TPC-DS may not be the best
dataset for showcasing the power of Iceberg and the dataset is small, still
Hive with Iceberg performed better than default Hive.

Hive original: 2119 seconds, geo-mean: 12.88 seconds
Hive + Iceberg: 1971 seconds, geo-mean: 11.15 seconds

Let me run a 10TB TPC-DS experiment using Hive-Tez + Iceberg sometime and
report the result here.

Regards,

--- Sungwoo


On Wed, Apr 9, 2025 at 11:08 PM Denys Kuzmenko <dkuzme...@apache.org> wrote:

> Hi,
>
> I'm a bit hesitant switching to Iceberg as the default atm. I lean more
> toward setting the default table format at the database level instead.
>
> Hive Iceberg currently lacks automatic table maintenance, comprehensive
> support for partition-level statistics, and various partition-aware
> optimizations (see HIVE-28410)
>
> Moreover, we haven't conducted any performance testing so far. It would be
> helpful to first assess where we currently stand before making a final
> decision.
>
> Regards,
> Denys
>
>

Reply via email to