Let's separate out the discussion of the 2 modules:
- hive-metastore - we definitely need the implementation and the tests
here, as we want to be able to progress with features like views without
waiting for a Hive release. So we need to move forward to Hive 4 now, and
keep the code in place
- hive-runtime - I think we should think again what to do here

When we moved the runtime code to the Hive repo, we did it because we
didn't have new Hive releases to build on in the Iceberg connector, and we
needed to unblock the integration development. We kept the old code here so
the new Iceberg core changes don't break the integration, and we agreed
that we revisit this situation later.

The old blocker is removed now as we have a stable Hive 4 out, and new Hive
releases are planned/done continuously.
Iceberg/Hive/Iceberg Hive connector are all quite stable now, so we can
decide what would be the best place to store the integration code.

I see 3 possibilities for the Hive connector:

   - Hive repo - remove all connector code from Iceberg, and manage
   everything in Hive
      - Simplifies the Iceberg code
      - Formalizes the current status quo
      - Core Iceberg changes are only tested against Hive when Hive
      upgrades to the new Iceberg version - we lose a fast feedback
loop for our
      changes
      - The Iceberg Hive connector version is strictly tied to the Hive
      version
   - Iceberg repo - remove all connector code from Hive and manage
   everything in Iceberg
      - Needs to be accepted by the Hive folks as they are the owner of
      most of the code now
      - Seems like a serious amount of work
      - We need a bigger pool of reviewers - the lack of reviewers were
      always a bottleneck with the Iceberg Hive connector development in the
      Iceberg repo
   -  Having a specific Iceberg Hive connector for the code
   - Needs to be accepted by the Hive folks as they are the owner of most
      of the code now
      - Seems like a serious amount of work
      - The connector repo could have a different pool of reviewers than
      either the Hive and the Iceberg project
      - The Iceberg Hive connector could be released independently from
      Hive and Iceberg too

I think the easiest solution would be to remove the hive-runtime from the
Iceberg repo. We would lose the assurance the Iceberg is working at least
one SQL engine, but moving the whole Iceberg-Hive integration code back to
Iceberg would be a serious effort, and I don't see that we have enough
committers/reviewers for managing it and ensuring that the Hive connector
development is not blocked on missing reviews. (I can do some reviews, but
we need multiple eyes on every module to be effective).

I would also wait a bit for someone from the Hive team to chime in what
they think.

Thanks,
Peter

On Fri, Nov 22, 2024, 15:21 Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi Peter and Fokko,
>
> What about Cheng Pan's point that there will be duplicated
> implementations in Hive and Iceberg if we upgrade iceberg-hive3 to
> iceberg-hive4?
>
> On Fri, Nov 22, 2024 at 5:18 PM Fokko Driesprong <fo...@apache.org> wrote:
>
>> I agree with Péter, that sounds like the right approach to me as well.
>>
>> Kind regards,
>> Fokko
>>
>> Op vr 22 nov 2024 om 07:38 schreef Péter Váry <
>> peter.vary.apa...@gmail.com>:
>>
>>> I would prefer B, and only revert to A if we find that B becomes too
>>> complicated.
>>>
>>> On Fri, Nov 22, 2024, 04:26 Manu Zhang <owenzhang1...@gmail.com> wrote:
>>>
>>>> Hi Peter,
>>>>
>>>> Would you be more specific on which option above do you prefer?
>>>>
>>>> Thanks,
>>>> Manu
>>>>
>>>> On Thu, Nov 21, 2024 at 10:07 PM Péter Váry <
>>>> peter.vary.apa...@gmail.com> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> Just to clarify. Hive 3 officially doesn't support Java 11, and there
>>>>> are no plans to release a new Hive 3 version with support.
>>>>> By "accident" the Hive Metastore tests are running with Hive 3 with
>>>>> Java 11, but the Hive runtime tests are not running (Starting the
>>>>> HiveServer fails, so no tests are running)
>>>>> Currently we don't know how Hive 4 is working from the Iceberg repo
>>>>> (we know that the Hive community is using Iceberg 1.6.1, so this shouldn't
>>>>> be a big issue)
>>>>>
>>>>> Since Hive 3 is not officially supported, I also suggest moving
>>>>> forward, and start using Hive 4. But we need to run our tests with Hive 4
>>>>> first before we change the documentation.
>>>>>
>>>>> Thanks,
>>>>> Peter
>>>>>
>>>>> Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta (időpont: 2024. nov.
>>>>> 21., Cs, 14:21):
>>>>>
>>>>>> Hi Manu
>>>>>>
>>>>>> It sounds like a plan. I think it makes sense to drop Hive 2 & 3 and
>>>>>> encourage use of Hive 4 (mostly documentation task).
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On Wed, Nov 20, 2024 at 7:19 AM Manu Zhang <owenzhang1...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Okay, let me add this option
>>>>>> >
>>>>>> > D. Drop Hive 2 & 3 support and suggest to use built-in Iceberg
>>>>>> support of Hive 4
>>>>>> >
>>>>>> > On Wed, Nov 20, 2024 at 2:00 PM Cheng Pan <pan3...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hive 4 brings built-in support for Iceberg format, duplicated
>>>>>> implementation in both sides look a redundant stuff.
>>>>>> >>
>>>>>> >> As Hive 2 and 3 do not support Java 11+, and Iceberg 1.8 requires
>>>>>> Java 11+, the combination is invalid. How about simply dropping support 
>>>>>> for
>>>>>> Hive 2&3 and suggesting the Hive user upgrade Hive 4 to gain the built-in
>>>>>> Iceberg support?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Cheng Pan
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Nov 20, 2024, at 12:47, Manu Zhang <owenzhang1...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hi all,
>>>>>> >>
>>>>>> >> We previously reached consensus[1] to deprecate Hive 2 in 1.7 and
>>>>>> drop in 1.8. However, when working on the removal PR[2], multiple tests
>>>>>> failed in Hive 3 due to not supporting JDK11[3]. The fix has been
>>>>>> back-ported to branch-3.1[4] but not released yet. As announced on Hive
>>>>>> website, Hive 3.x is declared as End of Life so there will be no more 
>>>>>> Hive
>>>>>> 3 release. Peter(@pvary) suggested upgrading to Hive 4 instead. On the
>>>>>> other hand, iceberg-hive3 tests are already broken after we dropped JDK 8
>>>>>> support. It's not caught previously due to tests not running[6].
>>>>>> >>
>>>>>> >> Based on the current situation, here are the options I can think
>>>>>> of to move forward
>>>>>> >>
>>>>>> >> A. Continue to remove Hive 2 in the current PR and upgrade to Hive
>>>>>> 4 in a separate PR.
>>>>>> >> B. Hold on removing Hive 2 until we upgrade to Hive 4
>>>>>> >> C. Add source dependency[7] on Hive branch-3.1 or make a Hive 3.1
>>>>>> release from a forked repo.
>>>>>> >>
>>>>>> >> 1.
>>>>>> https://lists.apache.org/thread/zg14b8cor4lnbyd3t4n1297y2bwb1fsg
>>>>>> >> 2. https://github.com/apache/iceberg/pull/10996
>>>>>> >> 3. https://issues.apache.org/jira/browse/HIVE-21584
>>>>>> >> 4. https://github.com/apache/hive/commits/branch-3.1/
>>>>>> >> 5. https://hive.apache.org/general/downloads/
>>>>>> >> 6. https://github.com/apache/iceberg/pull/11584
>>>>>> >> 7. https://blog.gradle.org/introducing-source-dependencies
>>>>>> >>
>>>>>> >> Which option do you prefer? Any better alternative?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Manu
>>>>>> >>
>>>>>> >>
>>>>>>
>>>>>

Reply via email to