Re: [DISCUSS] Hive Support

2025-01-04 Thread Manu Zhang
>
> This basically means that we need to support every exact Hive versions
> which are used by Spark, and we need to exclude our own Hive version from
> the Spark runtime.


Firstly, upgrading from Hive 2 to Hive 4 is a huge change, and I expect
compatibility to be much better once Iceberg and Spark are both on Hive 4.

Secondly, the coupling can be loosed if we are moving toward the REST
catalog.

On Fri, Jan 3, 2025 at 7:26 PM Péter Váry 
wrote:

> That sounds really interesting in a bad way :) :(
>
> This basically means that we need to support every exact Hive versions
> which are used by Spark, and we need to exclude our own Hive version from
> the Spark runtime.
>
> On Thu, Dec 19, 2024, 04:00 Manu Zhang  wrote:
>
>> Hi Peter,
>>
>>> I think we should make sure that the Iceberg Hive version is independent
>>> from the version used by Spark
>>
>>  I'm afraid that is not how it works currently. When Spark is deployed
>> with hive libraries (I suppose this is common), iceberg-spark runtime must
>> be compatible with them.
>> Otherwise, we need to ask users to exclude hive libraries from Spark and
>> ship iceberg-spark runtime with Iceberg's hive dependencies.\
>>
>> Regards,
>> Manu
>>
>> On Wed, Dec 18, 2024 at 9:08 PM Péter Váry 
>> wrote:
>>
>>> @Manu: What will be the end result? Do we have to use the same Hive
>>> version in Iceberg as it is defined by Spark? I think we should make sure
>>> that the Iceberg Hive version is independent from the version used by Spark
>>>
>>> On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com  wrote:
>>>
 > I'm not sure there's an upgrade path before Spark 4.0. Any ideas?

 We can at least separate the concerns. We can remove the runtime
 modules that are the main issue. If we compile against an older version of
 the Hive metastore module (leaving it unchanged) that at least has a
 dramatically reduced surface area for Java version issues. As long as the
 API is compatible (and we haven't heard complaints that it is not) then I
 think users can override the version in their environments.

 Ryan

 On Sun, Dec 15, 2024 at 5:55 PM Manu Zhang 
 wrote:

> Hi Daniel,
> I'll start a vote once I get the PR ready.
>
> Hi Ryan,
> Sorry, I wasn't clear in the last email that the consensus is to
> upgrade Hive metastore support.
>
> Well, I was too optimistic about the upgrade. Spark has only added
> hive 4.0 metastore support recently for Spark 4.0[1] and there will be
> conflicts
> between Spark's hive 2.3.9 and our hive 4.0 dependencies.
> I'm not sure there's an upgrade path before Spark 4.0. Any ideas?
>
> 1. https://issues.apache.org/jira/browse/SPARK-45265
>
> Thanks,
> Manu
>
>
> On Sat, Dec 14, 2024 at 4:31 AM rdb...@gmail.com 
> wrote:
>
>> Oh, I think I see. The upgrade to Hive 4 is just for the Hive
>> metastore support? When I read the thread, I thought that we weren't 
>> going
>> to change the metastore. That seems reasonable to me. Sorry for
>> the confusion.
>>
>> On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com 
>> wrote:
>>
>>> Sorry, I must have missed something. I don't think that we should
>>> upgrade anything in Iceberg to Hive 4. Why not simply remove the Hive
>>> support entirely? Why would anyone need Hive 4 support from Iceberg 
>>> when it
>>> is built into Hive 4?
>>>
>>> On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks 
>>> wrote:
>>>
 Hey Manu,

 I agree with the direction here, but we should probably hold a
 quick procedural vote just to confirm since this is a significant 
 change in
 support for Hive.

 -Dan

 On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang 
 wrote:

> Thanks all for sharing your thoughts. It looks there's a consensus
> on upgrading to Hive 4 and dropping hive-runtime.
> I've submitted a PR[1] as the first step. Please help review.
>
> 1. https://github.com/apache/iceberg/pull/11750
>
> Thanks,
> Manu
>
> On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya 
> wrote:
>
>> Hi all,
>>
>> I also prefer option 1. I have some initiatives[1] to improve
>> integrations between Hive and Iceberg. The current style allows
>> us to
>> develop both Hive's core and HiveIcebergStorageHandler
>> simultaneously.
>> That would help us enhance integrations.
>>
>> - [1] https://issues.apache.org/jira/browse/HIVE-28410
>>
>> Regards,
>> Okumin
>>
>> On Thu, Nov 28, 2024 at 4:17 AM Fokko Driesprong <
>> fo...@apache.org> wrote:
>> >
>> > Hey Cheng,
>> >
>> > Thanks for the suggestion. The nightly snapshots are 

Re: [VOTE] Drop Hive runtime

2025-01-04 Thread Manu Zhang
Happy New Year everyone! Wish you all the best!
Please kindly review the PR and cast your votes here.

Thanks,
Manu

On Sat, Dec 21, 2024 at 7:03 AM Daniel Weeks  wrote:

> +1
>
> On Wed, Dec 18, 2024 at 10:41 PM Jean-Baptiste Onofré 
> wrote:
>
>> +1 (non binding)
>>
>> I did a pass on the PRs and they look good to me.
>>
>> Thanks Manu !
>> Regards
>> JB
>>
>> On Wed, Dec 18, 2024 at 2:59 AM Manu Zhang 
>> wrote:
>> >
>> > Hi all,
>> >
>> > Thanks for sharing your ideas in the discussion of Hive support[1]. We
>> have a consensus to drop Hive runtime and upgrade Hive metastore connector
>> to Hive 4. However, it looks like we can't upgrade metastore support till
>> Spark 4[2]. Hence, I went on to create a separate PR to remove Hive runtime
>> first[3] as suggested by Ryan.
>> >
>> > I'd like to raise a vote to confirm whether the community is OK with
>> the change.
>> >
>> >
>> > 1. https://lists.apache.org/thread/jfcqfw9vhq4j7h0kwnlf338jgyzcq8s4
>> > 2. https://github.com/apache/iceberg/pull/11750
>> > 3. https://github.com/apache/iceberg/pull/11801
>> >
>> > Thanks,
>> > Manu
>> >
>>
>


Re: [VOTE] Drop Hive runtime

2025-01-04 Thread Matt Topol
+1 (non-binding)

On Sat, Jan 4, 2025, 11:20 PM Manu Zhang  wrote:

> Happy New Year everyone! Wish you all the best!
> Please kindly review the PR and cast your votes here.
>
> Thanks,
> Manu
>
> On Sat, Dec 21, 2024 at 7:03 AM Daniel Weeks  wrote:
>
>> +1
>>
>> On Wed, Dec 18, 2024 at 10:41 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 (non binding)
>>>
>>> I did a pass on the PRs and they look good to me.
>>>
>>> Thanks Manu !
>>> Regards
>>> JB
>>>
>>> On Wed, Dec 18, 2024 at 2:59 AM Manu Zhang 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Thanks for sharing your ideas in the discussion of Hive support[1]. We
>>> have a consensus to drop Hive runtime and upgrade Hive metastore connector
>>> to Hive 4. However, it looks like we can't upgrade metastore support till
>>> Spark 4[2]. Hence, I went on to create a separate PR to remove Hive runtime
>>> first[3] as suggested by Ryan.
>>> >
>>> > I'd like to raise a vote to confirm whether the community is OK with
>>> the change.
>>> >
>>> >
>>> > 1. https://lists.apache.org/thread/jfcqfw9vhq4j7h0kwnlf338jgyzcq8s4
>>> > 2. https://github.com/apache/iceberg/pull/11750
>>> > 3. https://github.com/apache/iceberg/pull/11801
>>> >
>>> > Thanks,
>>> > Manu
>>> >
>>>
>>