Iceberg gets a lot of traction and the integration with Hive becomes more
and more mature so it makes sense to start the discussion about making it
as the default choice.

However, I feel that it may be a bit too soon to do the switch right now.
Apart from performance numbers our Iceberg test coverage is rather limited
currently in Hive. The vast majority of tests are running using other
formats so before making it the default maybe we should first try to
migrate the tests to use that.

Moreover, the choice of a default format is tricky and varies from one use
case to the other so I am not sure if there exists one that overpowers the
rest in every aspect. For instance many people believed that adopting ACID
tables for everything was a good idea but soon after users started
migrating their workloads to ACID we started hitting many performance and
scalability challenges. The same reasoning applies for choices/debates
between ORC, Parquet, etc .

All in all, I wouldn't push very hard for one particular format and I would
prefer to leave the choice to the end-user who knows best their use case.
Having said that I am willing to follow and support the decision of the
community especially those people who contributed significantly to the
Iceberg integration.

Best,
Stamatis

On Wed, Apr 9, 2025, 4:08 PM Denys Kuzmenko <dkuzme...@apache.org> wrote:

> Hi,
>
> I'm a bit hesitant switching to Iceberg as the default atm. I lean more
> toward setting the default table format at the database level instead.
>
> Hive Iceberg currently lacks automatic table maintenance, comprehensive
> support for partition-level statistics, and various partition-aware
> optimizations (see HIVE-28410)
>
> Moreover, we haven't conducted any performance testing so far. It would be
> helpful to first assess where we currently stand before making a final
> decision.
>
> Regards,
> Denys
>
>

Reply via email to