Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

Shohei Okumiya Mon, 14 Apr 2025 05:07:41 -0700

Hi,

I'm thrilled to see various opinions in this thread! I respect Ayush
for initiating the discussion with the brave proposal and am proud of
all the community members here.

I am also aware of one interesting point of this thread: we believe in
the potential coverage of Apache Hive. Although the original proposal
is very simple, some people mentioned the new features of Hive
Metastore, some were concerned about the lack of some maintenance
features, some wanted to know the performance of Iceberg tables, and
some pushed integration with external catalogs. The sequence of
comments here sounds unique to Hive.

As the discussion inspired me, I also tried to draw one vision. My
question: Can Hive be an Operating System or DBMS for the Open Table
Format or Data Lakehouse?
https://docs.google.com/document/d/1tKFmsjYeGlMQjvJ7QQDNcS5wvqrcHRvsDjbJlcHb7Gk/edit?usp=sharing

The above document also summarizes diverse topics in this thread.
Please read it if you're interested, and please feel free to comment
on it.

Lastly, I apologize for throwing in one more divergent reply.

Regards,
Okumin

On Sun, Apr 13, 2025 at 1:32 AM Stamatis Zampetakis <[email protected]> wrote:
>
> Iceberg gets a lot of traction and the integration with Hive becomes more and 
> more mature so it makes sense to start the discussion about making it as the 
> default choice.
>
> However, I feel that it may be a bit too soon to do the switch right now. 
> Apart from performance numbers our Iceberg test coverage is rather limited 
> currently in Hive. The vast majority of tests are running using other formats 
> so before making it the default maybe we should first try to migrate the 
> tests to use that.
>
> Moreover, the choice of a default format is tricky and varies from one use 
> case to the other so I am not sure if there exists one that overpowers the 
> rest in every aspect. For instance many people believed that adopting ACID 
> tables for everything was a good idea but soon after users started migrating 
> their workloads to ACID we started hitting many performance and scalability 
> challenges. The same reasoning applies for choices/debates between ORC, 
> Parquet, etc .
>
> All in all, I wouldn't push very hard for one particular format and I would 
> prefer to leave the choice to the end-user who knows best their use case. 
> Having said that I am willing to follow and support the decision of the 
> community especially those people who contributed significantly to the 
> Iceberg integration.
>
> Best,
> Stamatis
>
>
> On Wed, Apr 9, 2025, 4:08 PM Denys Kuzmenko <[email protected]> wrote:
>>
>> Hi,
>>
>> I'm a bit hesitant switching to Iceberg as the default atm. I lean more 
>> toward setting the default table format at the database level instead.
>>
>> Hive Iceberg currently lacks automatic table maintenance, comprehensive 
>> support for partition-level statistics, and various partition-aware 
>> optimizations (see HIVE-28410)
>>
>> Moreover, we haven't conducted any performance testing so far. It would be 
>> helpful to first assess where we currently stand before making a final 
>> decision.
>>
>> Regards,
>> Denys
>>

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

Reply via email to