Hi Manu,
My hope is that the Hive 4 problem is "only" a test issue. Since similar
tests are running (or were running when I last have seen it in the Hive
codebase), there should be a working version of TestHiveMetastore which
runs these tests. We might be able to incorporate a similar code into ou
Thanks Wing Yew for filling in the missing part.
>
> The built-in version is also used for other things that Spark may use from
> Hive (aside from interaction with HMS), such as Hive SerDes.
AFAIK, this is blocking Spark itself from upgrade the built-in version to
Hive 4.
Thanks Peter for recap.
Hi Peter,
Re
"Hive would provide a HMS client jar which only contains java code which is
needed to connect and communicate using Thrift with a HMS instance (no internal
HMS server code etc). We could use it as a dependency for our
iceberg-hive-metastore module. Either setting a minimal version
Thanks Wing Yew,
We should remove the Iceberg Hive Runtime module, but make sure that the
Iceberg Hive Metastore module tests are running against the supported(?)
Hive 2.3.10/3.1.3/4.0.1 versions. Other tests could run against
whatever Hive version they prefer
In details:
--
Let me r
FYI --
It looks like the built-in Hive version in the master branch of Apache
Spark is 2.3.10 (https://issues.apache.org/jira/browse/SPARK-47018), and
https://issues.apache.org/jira/browse/SPARK-44114 (upgrade built-in Hive to
3+) is an open issue.
On Mon, Jan 6, 2025 at 1:07 PM Wing Yew Poon wr
Hi Peter,
In Spark, you can specify the Hive version of the metastore that you want
to use. There is a configuration, spark.sql.hive.metastore.version, which
currently (as of Spark 3.5) defaults to 2.3.9, and the jars supporting this
default version are shipped with Spark as built-in. You can speci
Hi Manu,
> Spark has only added hive 4.0 metastore support recently for Spark 4.0[1]
and there will be conflicts
Does this mean that Spark 4.0 will always use Hive 4 code? Or it will use
Hive 2 when it is present on the classpath, but if older Hive versions are
not on the classpath then it will u
>
> This basically means that we need to support every exact Hive versions
> which are used by Spark, and we need to exclude our own Hive version from
> the Spark runtime.
Firstly, upgrading from Hive 2 to Hive 4 is a huge change, and I expect
compatibility to be much better once Iceberg and Spar
That sounds really interesting in a bad way :) :(
This basically means that we need to support every exact Hive versions
which are used by Spark, and we need to exclude our own Hive version from
the Spark runtime.
On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote:
> Hi Peter,
>
>> I think we should
Hi Peter,
> I think we should make sure that the Iceberg Hive version is independent
> from the version used by Spark
I'm afraid that is not how it works currently. When Spark is deployed with
hive libraries (I suppose this is common), iceberg-spark runtime must be
compatible with them.
Otherwis
@Manu: What will be the end result? Do we have to use the same Hive version
in Iceberg as it is defined by Spark? I think we should make sure that the
Iceberg Hive version is independent from the version used by Spark
On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com wrote:
> > I'm not sure there's a
> I'm not sure there's an upgrade path before Spark 4.0. Any ideas?
We can at least separate the concerns. We can remove the runtime modules
that are the main issue. If we compile against an older version of the Hive
metastore module (leaving it unchanged) that at least has a dramatically
reduced
Hi Daniel,
I'll start a vote once I get the PR ready.
Hi Ryan,
Sorry, I wasn't clear in the last email that the consensus is to upgrade
Hive metastore support.
Well, I was too optimistic about the upgrade. Spark has only added hive 4.0
metastore support recently for Spark 4.0[1] and there will be
Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore
support? When I read the thread, I thought that we weren't going to change
the metastore. That seems reasonable to me. Sorry for the confusion.
On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com wrote:
> Sorry, I must have mi
Sorry, I must have missed something. I don't think that we should upgrade
anything in Iceberg to Hive 4. Why not simply remove the Hive support
entirely? Why would anyone need Hive 4 support from Iceberg when it is
built into Hive 4?
On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks wrote:
> Hey Man
Hey Manu,
I agree with the direction here, but we should probably hold a quick
procedural vote just to confirm since this is a significant change in
support for Hive.
-Dan
On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang wrote:
> Thanks all for sharing your thoughts. It looks there's a consensus on
Thanks all for sharing your thoughts. It looks there's a consensus on
upgrading to Hive 4 and dropping hive-runtime.
I've submitted a PR[1] as the first step. Please help review.
1. https://github.com/apache/iceberg/pull/11750
Thanks,
Manu
On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya wrote:
Hi all,
I also prefer option 1. I have some initiatives[1] to improve
integrations between Hive and Iceberg. The current style allows us to
develop both Hive's core and HiveIcebergStorageHandler simultaneously.
That would help us enhance integrations.
- [1] https://issues.apache.org/jira/browse/H
Given that the Hive folks also leaning towards keeping the hive-runtime
code in the Hive repo, I think we should move forward as Cheng Pan
suggested:
- Upgrade to Hive 4
- Remove hive-runtime code and tests
- Make sure that a nightly build is available, so Hive folks could run
integration tests, an
+1 to remove support for both Hive2 and Hive3 in the latest Iceberg release
as it has reached EOL.
Hive4 is natively managing Iceberg integration, similar to how Trino
handles its Iceberg integration. Therefore, in my opinion, it would be
better for engines to manage the integration aspect, allowi
Hey Cheng,
Thanks for the suggestion. The nightly snapshots are available:
https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/,
which might help when working on features that are not released yet (eg
Nanosecond timestamps). Besides that, we should run RCs agains
I think that we should remove Hive 2 and Hive 3. We already agreed to
remove Hive 2, but Hive 3 is not compatible with the project anymore and is
already EOL and will not see a release to update it so that it can be
compatible. Anyone using the existing Hive 3 support should be able to
continue usi
> That said, it would be helpful if they continue running
> tests against the latest stable Hive releases to ensure that any
> changes don’t unintentionally break something for Hive, which would be
> beyond our control.
> I believe we should continue maintaining a Hive Iceberg runtime test suite
> Let me know if the above doesn't make any sense, though!
To be honest, it doesn’t. The email feels accusatory, unfairly blaming
the Hive community for wrongdoing while portraying the Iceberg folks
as "worse" and insinuating misconduct on their part. This kind of tone
does nothing to foster conse
time. I have seen that
>> Trino repo did some Spark integration testing(
>> https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java)
>> . Maybe we can consider this way.
>>
>>
>
trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java)
> . Maybe we can consider this way.
>
>
>
> Thanks,
> Butao Zhang
> ---- Replied Message
> From Wing Yew Poon
> Date 11/26/2024 05:50
> To
> Cc
> Subject Re: [DI
hanks,
Butao Zhang
Replied Message
| From | Wing Yew Poon |
| Date | 11/26/2024 05:50 |
| To | |
| Cc | |
| Subject | Re: [DISCUSS] Hive Support |
For the Hive runtime, would it be feasible for the Hive community to contribute
a suite of tests to the Iceberg repo that can be run with dependencie
For the Hive runtime, would it be feasible for the Hive community to
contribute a suite of tests to the Iceberg repo that can be run with
dependencies from the latest Hive release (Hive 4.x), and then update them
from time to time as appropriate? The purpose of this suite would be to
test integrati
Hi Peter,
Thanks for bringing this to our attention.
>From my side, I have a say only on the code that resides in the Hive
repository. I am okay with the first approach, as we are already
following it for the most part. Whether Iceberg keeps or drops the
code shouldn’t have much impact on us. (I
Let's separate out the discussion of the 2 modules:
- hive-metastore - we definitely need the implementation and the tests
here, as we want to be able to progress with features like views without
waiting for a Hive release. So we need to move forward to Hive 4 now, and
keep the code in place
- hive
Hi Peter and Fokko,
What about Cheng Pan's point that there will be duplicated
implementations in Hive and Iceberg if we upgrade iceberg-hive3 to
iceberg-hive4?
On Fri, Nov 22, 2024 at 5:18 PM Fokko Driesprong wrote:
> I agree with Péter, that sounds like the right approach to me as well.
>
> K
I agree with Péter, that sounds like the right approach to me as well.
Kind regards,
Fokko
Op vr 22 nov 2024 om 07:38 schreef Péter Váry :
> I would prefer B, and only revert to A if we find that B becomes too
> complicated.
>
> On Fri, Nov 22, 2024, 04:26 Manu Zhang wrote:
>
>> Hi Peter,
>>
>>
I would prefer B, and only revert to A if we find that B becomes too
complicated.
On Fri, Nov 22, 2024, 04:26 Manu Zhang wrote:
> Hi Peter,
>
> Would you be more specific on which option above do you prefer?
>
> Thanks,
> Manu
>
> On Thu, Nov 21, 2024 at 10:07 PM Péter Váry
> wrote:
>
>> Hi Tea
Hi Peter,
Would you be more specific on which option above do you prefer?
Thanks,
Manu
On Thu, Nov 21, 2024 at 10:07 PM Péter Váry
wrote:
> Hi Team,
>
> Just to clarify. Hive 3 officially doesn't support Java 11, and there are
> no plans to release a new Hive 3 version with support.
> By "acci
Hi Team,
Just to clarify. Hive 3 officially doesn't support Java 11, and there are
no plans to release a new Hive 3 version with support.
By "accident" the Hive Metastore tests are running with Hive 3 with Java
11, but the Hive runtime tests are not running (Starting the HiveServer
fails, so no te
Hi Manu
It sounds like a plan. I think it makes sense to drop Hive 2 & 3 and
encourage use of Hive 4 (mostly documentation task).
Regards
JB
On Wed, Nov 20, 2024 at 7:19 AM Manu Zhang wrote:
>
> Okay, let me add this option
>
> D. Drop Hive 2 & 3 support and suggest to use built-in Iceberg supp
>
> It is my understanding that removing the hive-metastore module is NOT
> under consideration; is that correct?
>
Correct, the modules under discussion here are specifically iceberg-mr and
iceberg-hive3. I don't see any problems for other modules to upgrade to
Hive 3.
On Thu, Nov 21, 2024 at 4:
Also to clarify --
It is my understanding that removing the hive-metastore module is NOT under
consideration; is that correct?
We still need a Hive version to depend on for the hive-metastore module. In
https://github.com/apache/iceberg/pull/10996, this is Hive 3. Does this
present any problem?
O
To clarify, the changes discussed here don't affect hive connectors in
engines, which either use the built-in hive version (Spark) or can be
upgraded to hive 3 (Flink).
On Wed, Nov 20, 2024 at 2:19 PM Manu Zhang wrote:
> Okay, let me add this option
>
> D. Drop Hive 2 & 3 support and suggest to
Okay, let me add this option
D. Drop Hive 2 & 3 support and suggest to use built-in Iceberg support of
Hive 4
On Wed, Nov 20, 2024 at 2:00 PM Cheng Pan wrote:
> Hive 4 brings built-in support for Iceberg format, duplicated
> implementation in both sides look a redundant stuff.
>
> As Hive 2 and
Hive 4 brings built-in support for Iceberg format, duplicated implementation in
both sides look a redundant stuff.
As Hive 2 and 3 do not support Java 11+, and Iceberg 1.8 requires Java 11+, the
combination is invalid. How about simply dropping support for Hive 2&3 and
suggesting the Hive user
Hi all,
We previously reached consensus[1] to deprecate Hive 2 in 1.7 and drop in
1.8. However, when working on the removal PR[2], multiple tests failed in
Hive 3 due to not supporting JDK11[3]. The fix has been back-ported to
branch-3.1[4] but not released yet. As announced on Hive website, Hive
42 matches
Mail list logo