Hi Peter, > I think we should make sure that the Iceberg Hive version is independent > from the version used by Spark
I'm afraid that is not how it works currently. When Spark is deployed with hive libraries (I suppose this is common), iceberg-spark runtime must be compatible with them. Otherwise, we need to ask users to exclude hive libraries from Spark and ship iceberg-spark runtime with Iceberg's hive dependencies.\ Regards, Manu On Wed, Dec 18, 2024 at 9:08 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > @Manu: What will be the end result? Do we have to use the same Hive > version in Iceberg as it is defined by Spark? I think we should make sure > that the Iceberg Hive version is independent from the version used by Spark > > On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com <rdb...@gmail.com> wrote: > >> > I'm not sure there's an upgrade path before Spark 4.0. Any ideas? >> >> We can at least separate the concerns. We can remove the runtime modules >> that are the main issue. If we compile against an older version of the Hive >> metastore module (leaving it unchanged) that at least has a dramatically >> reduced surface area for Java version issues. As long as the API is >> compatible (and we haven't heard complaints that it is not) then I think >> users can override the version in their environments. >> >> Ryan >> >> On Sun, Dec 15, 2024 at 5:55 PM Manu Zhang <owenzhang1...@gmail.com> >> wrote: >> >>> Hi Daniel, >>> I'll start a vote once I get the PR ready. >>> >>> Hi Ryan, >>> Sorry, I wasn't clear in the last email that the consensus is to upgrade >>> Hive metastore support. >>> >>> Well, I was too optimistic about the upgrade. Spark has only added hive >>> 4.0 metastore support recently for Spark 4.0[1] and there will be conflicts >>> between Spark's hive 2.3.9 and our hive 4.0 dependencies. >>> I'm not sure there's an upgrade path before Spark 4.0. Any ideas? >>> >>> 1. https://issues.apache.org/jira/browse/SPARK-45265 >>> >>> Thanks, >>> Manu >>> >>> >>> On Sat, Dec 14, 2024 at 4:31 AM rdb...@gmail.com <rdb...@gmail.com> >>> wrote: >>> >>>> Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore >>>> support? When I read the thread, I thought that we weren't going to change >>>> the metastore. That seems reasonable to me. Sorry for the confusion. >>>> >>>> On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com <rdb...@gmail.com> >>>> wrote: >>>> >>>>> Sorry, I must have missed something. I don't think that we should >>>>> upgrade anything in Iceberg to Hive 4. Why not simply remove the Hive >>>>> support entirely? Why would anyone need Hive 4 support from Iceberg when >>>>> it >>>>> is built into Hive 4? >>>>> >>>>> On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks <dwe...@apache.org> >>>>> wrote: >>>>> >>>>>> Hey Manu, >>>>>> >>>>>> I agree with the direction here, but we should probably hold a quick >>>>>> procedural vote just to confirm since this is a significant change in >>>>>> support for Hive. >>>>>> >>>>>> -Dan >>>>>> >>>>>> On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks all for sharing your thoughts. It looks there's a consensus >>>>>>> on upgrading to Hive 4 and dropping hive-runtime. >>>>>>> I've submitted a PR[1] as the first step. Please help review. >>>>>>> >>>>>>> 1. https://github.com/apache/iceberg/pull/11750 >>>>>>> >>>>>>> Thanks, >>>>>>> Manu >>>>>>> >>>>>>> On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya <oku...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I also prefer option 1. I have some initiatives[1] to improve >>>>>>>> integrations between Hive and Iceberg. The current style allows us >>>>>>>> to >>>>>>>> develop both Hive's core and HiveIcebergStorageHandler >>>>>>>> simultaneously. >>>>>>>> That would help us enhance integrations. >>>>>>>> >>>>>>>> - [1] https://issues.apache.org/jira/browse/HIVE-28410 >>>>>>>> >>>>>>>> Regards, >>>>>>>> Okumin >>>>>>>> >>>>>>>> On Thu, Nov 28, 2024 at 4:17 AM Fokko Driesprong <fo...@apache.org> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > Hey Cheng, >>>>>>>> > >>>>>>>> > Thanks for the suggestion. The nightly snapshots are available: >>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/, >>>>>>>> which might help when working on features that are not released yet (eg >>>>>>>> Nanosecond timestamps). Besides that, we should run RCs against Hive to >>>>>>>> check if everything works as expected. >>>>>>>> > >>>>>>>> > I'm leaning toward removing Hive 2 and 3 as well. >>>>>>>> > >>>>>>>> > Kind regards, >>>>>>>> > Fokko >>>>>>>> > >>>>>>>> > Op wo 27 nov 2024 om 20:05 schreef rdb...@gmail.com < >>>>>>>> rdb...@gmail.com>: >>>>>>>> >> >>>>>>>> >> I think that we should remove Hive 2 and Hive 3. We already >>>>>>>> agreed to remove Hive 2, but Hive 3 is not compatible with the project >>>>>>>> anymore and is already EOL and will not see a release to update it so >>>>>>>> that >>>>>>>> it can be compatible. Anyone using the existing Hive 3 support should >>>>>>>> be >>>>>>>> able to continue using older releases. >>>>>>>> >> >>>>>>>> >> In general, I think it's a good idea to let people use older >>>>>>>> releases when these situations happen. It is difficult for the project >>>>>>>> to >>>>>>>> continue to support libraries that are EOL and I don't think there's a >>>>>>>> great justification for it, considering Iceberg support in Hive 4 is >>>>>>>> native >>>>>>>> and much better! >>>>>>>> >> >>>>>>>> >> On Wed, Nov 27, 2024 at 7:12 AM Cheng Pan <pan3...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>> >>>>>>>> >>> That said, it would be helpful if they continue running >>>>>>>> >>> tests against the latest stable Hive releases to ensure that any >>>>>>>> >>> changes don’t unintentionally break something for Hive, which >>>>>>>> would be >>>>>>>> >>> beyond our control. >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> I believe we should continue maintaining a Hive Iceberg runtime >>>>>>>> test suite with the latest version of Hive in the Iceberg repository. >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> i think we can keep some basic Hive4 tests in iceberg repo >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> Instead of running basic tests on the Iceberg repo, maybe let >>>>>>>> Iceberg publish daily snapshot jars to Nexus, and have a daily CI in >>>>>>>> Hive >>>>>>>> to consume those jars and run full Iceberg tests makes more sense? >>>>>>>> >>> >>>>>>>> >>> Thanks, >>>>>>>> >>> Cheng Pan >>>>>>>> >>> >>>>>>>> >>>>>>>