@Manu: What will be the end result? Do we have to use the same Hive version in Iceberg as it is defined by Spark? I think we should make sure that the Iceberg Hive version is independent from the version used by Spark
On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com <rdb...@gmail.com> wrote: > > I'm not sure there's an upgrade path before Spark 4.0. Any ideas? > > We can at least separate the concerns. We can remove the runtime modules > that are the main issue. If we compile against an older version of the Hive > metastore module (leaving it unchanged) that at least has a dramatically > reduced surface area for Java version issues. As long as the API is > compatible (and we haven't heard complaints that it is not) then I think > users can override the version in their environments. > > Ryan > > On Sun, Dec 15, 2024 at 5:55 PM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> Hi Daniel, >> I'll start a vote once I get the PR ready. >> >> Hi Ryan, >> Sorry, I wasn't clear in the last email that the consensus is to upgrade >> Hive metastore support. >> >> Well, I was too optimistic about the upgrade. Spark has only added hive >> 4.0 metastore support recently for Spark 4.0[1] and there will be conflicts >> between Spark's hive 2.3.9 and our hive 4.0 dependencies. >> I'm not sure there's an upgrade path before Spark 4.0. Any ideas? >> >> 1. https://issues.apache.org/jira/browse/SPARK-45265 >> >> Thanks, >> Manu >> >> >> On Sat, Dec 14, 2024 at 4:31 AM rdb...@gmail.com <rdb...@gmail.com> >> wrote: >> >>> Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore >>> support? When I read the thread, I thought that we weren't going to change >>> the metastore. That seems reasonable to me. Sorry for the confusion. >>> >>> On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com <rdb...@gmail.com> >>> wrote: >>> >>>> Sorry, I must have missed something. I don't think that we should >>>> upgrade anything in Iceberg to Hive 4. Why not simply remove the Hive >>>> support entirely? Why would anyone need Hive 4 support from Iceberg when it >>>> is built into Hive 4? >>>> >>>> On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks <dwe...@apache.org> >>>> wrote: >>>> >>>>> Hey Manu, >>>>> >>>>> I agree with the direction here, but we should probably hold a quick >>>>> procedural vote just to confirm since this is a significant change in >>>>> support for Hive. >>>>> >>>>> -Dan >>>>> >>>>> On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang <owenzhang1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks all for sharing your thoughts. It looks there's a consensus on >>>>>> upgrading to Hive 4 and dropping hive-runtime. >>>>>> I've submitted a PR[1] as the first step. Please help review. >>>>>> >>>>>> 1. https://github.com/apache/iceberg/pull/11750 >>>>>> >>>>>> Thanks, >>>>>> Manu >>>>>> >>>>>> On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya <oku...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I also prefer option 1. I have some initiatives[1] to improve >>>>>>> integrations between Hive and Iceberg. The current style allows us to >>>>>>> develop both Hive's core and HiveIcebergStorageHandler >>>>>>> simultaneously. >>>>>>> That would help us enhance integrations. >>>>>>> >>>>>>> - [1] https://issues.apache.org/jira/browse/HIVE-28410 >>>>>>> >>>>>>> Regards, >>>>>>> Okumin >>>>>>> >>>>>>> On Thu, Nov 28, 2024 at 4:17 AM Fokko Driesprong <fo...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hey Cheng, >>>>>>> > >>>>>>> > Thanks for the suggestion. The nightly snapshots are available: >>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/, >>>>>>> which might help when working on features that are not released yet (eg >>>>>>> Nanosecond timestamps). Besides that, we should run RCs against Hive to >>>>>>> check if everything works as expected. >>>>>>> > >>>>>>> > I'm leaning toward removing Hive 2 and 3 as well. >>>>>>> > >>>>>>> > Kind regards, >>>>>>> > Fokko >>>>>>> > >>>>>>> > Op wo 27 nov 2024 om 20:05 schreef rdb...@gmail.com < >>>>>>> rdb...@gmail.com>: >>>>>>> >> >>>>>>> >> I think that we should remove Hive 2 and Hive 3. We already >>>>>>> agreed to remove Hive 2, but Hive 3 is not compatible with the project >>>>>>> anymore and is already EOL and will not see a release to update it so >>>>>>> that >>>>>>> it can be compatible. Anyone using the existing Hive 3 support should be >>>>>>> able to continue using older releases. >>>>>>> >> >>>>>>> >> In general, I think it's a good idea to let people use older >>>>>>> releases when these situations happen. It is difficult for the project >>>>>>> to >>>>>>> continue to support libraries that are EOL and I don't think there's a >>>>>>> great justification for it, considering Iceberg support in Hive 4 is >>>>>>> native >>>>>>> and much better! >>>>>>> >> >>>>>>> >> On Wed, Nov 27, 2024 at 7:12 AM Cheng Pan <pan3...@gmail.com> >>>>>>> wrote: >>>>>>> >>> >>>>>>> >>> That said, it would be helpful if they continue running >>>>>>> >>> tests against the latest stable Hive releases to ensure that any >>>>>>> >>> changes don’t unintentionally break something for Hive, which >>>>>>> would be >>>>>>> >>> beyond our control. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> I believe we should continue maintaining a Hive Iceberg runtime >>>>>>> test suite with the latest version of Hive in the Iceberg repository. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> i think we can keep some basic Hive4 tests in iceberg repo >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Instead of running basic tests on the Iceberg repo, maybe let >>>>>>> Iceberg publish daily snapshot jars to Nexus, and have a daily CI in >>>>>>> Hive >>>>>>> to consume those jars and run full Iceberg tests makes more sense? >>>>>>> >>> >>>>>>> >>> Thanks, >>>>>>> >>> Cheng Pan >>>>>>> >>> >>>>>>> >>>>>>