I'm definitely +1 on this proposal.

Apache Hive has deeply integrated the Iceberg table format for several years 
now. Switching the default table format to Iceberg would send a strong and 
positive signal to the community: that Iceberg has become a first-class citizen 
in the Hive engine, and it's fully capable of supporting large-scale production 
use cases.

Let me also share a few potential usage considerations. Iceberg, like Hive's 
ACID tables, is designed to support efficient DML operations. However, such 
operations can often lead to an explosion of metadata files and an excessive 
number of snapshots. I know some community members are already working on 
automatic compaction and cleanup features for Iceberg tables, which is great. 
But I believe there's still room for further optimization, and more real-world 
user feedback would certainly help drive improvements in this area.

IMHO, our recent work on the Iceberg REST catalog server and client is a great 
start—it enhances Hive’s ability to interoperate with other open-source systems 
a. But it's not the end goal. If we can further push forward efforts like 
HIVE-28879 to support multi-catalog and federated catalog capabilities, I 
believe it would bring Hive to the next level. I hope we can all work together 
on this. :)

Finally, while Iceberg is currently the most popular table format, formats like 
Apache Paimon and Apache Hudi are also developing rapidly. I'm wondering if we 
should consider abstracting some of the table format-related code to avoid 
tightly coupling with Iceberg. This could make it easier to deeply integrate 
other formats in the future. Of course, for now, Iceberg remains our main 
focus—just sharing some random thoughts, feel free to challenge them, haha.


Thanks,
Butao Zhang

On 2025/04/07 12:12:05 Attila Turoczy wrote:
> Hi,
> 
> I strongly support this proposal. Hive would be one of the first engines
> globally to make a clear and public commitment to Apache Iceberg, which is
> a significant and forward-looking step. From my perspective, the majority
> of recent development efforts have been focused on Iceberg, and it makes
> perfect sense to communicate this direction transparently to the community.
> 
> I don’t see any major downsides for Iceberg in this transition. While it’s
> true that we currently lack comprehensive performance benchmarks comparing
> Hive-supported table formats, users will still have the flexibility to
> choose the format that best suits their needs. That said, from a default
> standpoint, Iceberg represents our oss strategic direction and the majority
> of our ongoing investment.
> 
> Regarding Hive ACID, it’s worth noting that recent contributions have
> primarily been limited to bug fixes, with minimal active development.
> Therefore, from both a technical and strategic perspective, I’m fully in
> favor of adopting Iceberg as the default table format.
> 
> +2 from me. I’ll also seek support to gain a deeper understanding of the
> performance implications across formats, that we could share via a blog
> post.
> -Attila
> 
> On Mon, Apr 7, 2025 at 1:45 PM Shohei Okumiya <oku...@apache.org> wrote:
> 
> > Hi Ayush,
> >
> > Thanks for initiating the interesting discussion.
> >
> > In my personal opinion, it is likely a good idea. Apache Iceberg is
> > competitive and open. I can't immediately mention significant
> > drawbacks when users use Iceberg tables instead.
> >
> > As a community member, I'm interested in data and facts that support
> > the option. For example, I don't know how many users have adopted
> > Apache Iceberg or other open table formats. I agree Apache Iceberg is
> > the strongest candidate when we consider the next default format. It
> > is one of the formats I frequently hear, and it is integrated with
> > Apache Hive very well.
> >
> > Regards,
> > Okumin
> >
> > On Mon, Apr 7, 2025 at 6:40 PM Ayush Saxena <ayush...@gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > I’d like to initiate a discussion around changing the default table
> > > format to Iceberg in the upcoming releases.
> > >
> > > Since we began development, Hive-Iceberg integration has matured
> > > significantly. Given the increasing market traction and growing user
> > > interest, I believe it might be the right time to consider making
> > > Iceberg the default table format.
> > >
> > > I’d love to hear what others think — whether this is a good idea, any
> > > potential challenges we might face, or if there are specific
> > > prerequisites we should address before moving in this direction.
> > >
> > > Personally, I’m quite inclined towards this change, but of course,
> > > open to hearing different perspectives.
> > >
> > > Looking forward to your thoughts!
> > >
> > > -Ayush
> >
> 

Reply via email to