Hi Peter,

> I think we should make sure that the Iceberg Hive version is independent
> from the version used by Spark

 I'm afraid that is not how it works currently. When Spark is deployed with
hive libraries (I suppose this is common), iceberg-spark runtime must be
compatible with them.
Otherwise, we need to ask users to exclude hive libraries from Spark and
ship iceberg-spark runtime with Iceberg's hive dependencies.\

Regards,
Manu

On Wed, Dec 18, 2024 at 9:08 PM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> @Manu: What will be the end result? Do we have to use the same Hive
> version in Iceberg as it is defined by Spark? I think we should make sure
> that the Iceberg Hive version is independent from the version used by Spark
>
> On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com <rdb...@gmail.com> wrote:
>
>> > I'm not sure there's an upgrade path before Spark 4.0. Any ideas?
>>
>> We can at least separate the concerns. We can remove the runtime modules
>> that are the main issue. If we compile against an older version of the Hive
>> metastore module (leaving it unchanged) that at least has a dramatically
>> reduced surface area for Java version issues. As long as the API is
>> compatible (and we haven't heard complaints that it is not) then I think
>> users can override the version in their environments.
>>
>> Ryan
>>
>> On Sun, Dec 15, 2024 at 5:55 PM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>>
>>> Hi Daniel,
>>> I'll start a vote once I get the PR ready.
>>>
>>> Hi Ryan,
>>> Sorry, I wasn't clear in the last email that the consensus is to upgrade
>>> Hive metastore support.
>>>
>>> Well, I was too optimistic about the upgrade. Spark has only added hive
>>> 4.0 metastore support recently for Spark 4.0[1] and there will be conflicts
>>> between Spark's hive 2.3.9 and our hive 4.0 dependencies.
>>> I'm not sure there's an upgrade path before Spark 4.0. Any ideas?
>>>
>>> 1. https://issues.apache.org/jira/browse/SPARK-45265
>>>
>>> Thanks,
>>> Manu
>>>
>>>
>>> On Sat, Dec 14, 2024 at 4:31 AM rdb...@gmail.com <rdb...@gmail.com>
>>> wrote:
>>>
>>>> Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore
>>>> support? When I read the thread, I thought that we weren't going to change
>>>> the metastore. That seems reasonable to me. Sorry for the confusion.
>>>>
>>>> On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com <rdb...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry, I must have missed something. I don't think that we should
>>>>> upgrade anything in Iceberg to Hive 4. Why not simply remove the Hive
>>>>> support entirely? Why would anyone need Hive 4 support from Iceberg when 
>>>>> it
>>>>> is built into Hive 4?
>>>>>
>>>>> On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks <dwe...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hey Manu,
>>>>>>
>>>>>> I agree with the direction here, but we should probably hold a quick
>>>>>> procedural vote just to confirm since this is a significant change in
>>>>>> support for Hive.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>> On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks all for sharing your thoughts. It looks there's a consensus
>>>>>>> on upgrading to Hive 4 and dropping hive-runtime.
>>>>>>> I've submitted a PR[1] as the first step. Please help review.
>>>>>>>
>>>>>>> 1. https://github.com/apache/iceberg/pull/11750
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manu
>>>>>>>
>>>>>>> On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya <oku...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I also prefer option 1. I have some initiatives[1] to improve
>>>>>>>> integrations between Hive and Iceberg. The current style allows us
>>>>>>>> to
>>>>>>>> develop both Hive's core and HiveIcebergStorageHandler
>>>>>>>> simultaneously.
>>>>>>>> That would help us enhance integrations.
>>>>>>>>
>>>>>>>> - [1] https://issues.apache.org/jira/browse/HIVE-28410
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Okumin
>>>>>>>>
>>>>>>>> On Thu, Nov 28, 2024 at 4:17 AM Fokko Driesprong <fo...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hey Cheng,
>>>>>>>> >
>>>>>>>> > Thanks for the suggestion. The nightly snapshots are available:
>>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/,
>>>>>>>> which might help when working on features that are not released yet (eg
>>>>>>>> Nanosecond timestamps). Besides that, we should run RCs against Hive to
>>>>>>>> check if everything works as expected.
>>>>>>>> >
>>>>>>>> > I'm leaning toward removing Hive 2 and 3 as well.
>>>>>>>> >
>>>>>>>> > Kind regards,
>>>>>>>> > Fokko
>>>>>>>> >
>>>>>>>> > Op wo 27 nov 2024 om 20:05 schreef rdb...@gmail.com <
>>>>>>>> rdb...@gmail.com>:
>>>>>>>> >>
>>>>>>>> >> I think that we should remove Hive 2 and Hive 3. We already
>>>>>>>> agreed to remove Hive 2, but Hive 3 is not compatible with the project
>>>>>>>> anymore and is already EOL and will not see a release to update it so 
>>>>>>>> that
>>>>>>>> it can be compatible. Anyone using the existing Hive 3 support should 
>>>>>>>> be
>>>>>>>> able to continue using older releases.
>>>>>>>> >>
>>>>>>>> >> In general, I think it's a good idea to let people use older
>>>>>>>> releases when these situations happen. It is difficult for the project 
>>>>>>>> to
>>>>>>>> continue to support libraries that are EOL and I don't think there's a
>>>>>>>> great justification for it, considering Iceberg support in Hive 4 is 
>>>>>>>> native
>>>>>>>> and much better!
>>>>>>>> >>
>>>>>>>> >> On Wed, Nov 27, 2024 at 7:12 AM Cheng Pan <pan3...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>> That said, it would be helpful if they continue running
>>>>>>>> >>> tests against the latest stable Hive releases to ensure that any
>>>>>>>> >>> changes don’t unintentionally break something for Hive, which
>>>>>>>> would be
>>>>>>>> >>> beyond our control.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> I believe we should continue maintaining a Hive Iceberg runtime
>>>>>>>> test suite with the latest version of Hive in the Iceberg repository.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> i think we can keep some basic Hive4 tests in iceberg repo
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Instead of running basic tests on the Iceberg repo, maybe let
>>>>>>>> Iceberg publish daily snapshot jars to Nexus, and have a daily CI in 
>>>>>>>> Hive
>>>>>>>> to consume those jars and run full Iceberg tests makes more sense?
>>>>>>>> >>>
>>>>>>>> >>> Thanks,
>>>>>>>> >>> Cheng Pan
>>>>>>>> >>>
>>>>>>>>
>>>>>>>

Reply via email to