Re: [DISCUSS] Hive Support

2025-01-08 Thread Péter Váry
Hi Manu, My hope is that the Hive 4 problem is "only" a test issue. Since similar tests are running (or were running when I last have seen it in the Hive codebase), there should be a working version of TestHiveMetastore which runs these tests. We might be able to incorporate a similar code into ou

Re: [DISCUSS] Hive Support

2025-01-07 Thread Manu Zhang
Thanks Wing Yew for filling in the missing part. > > The built-in version is also used for other things that Spark may use from > Hive (aside from interaction with HMS), such as Hive SerDes. AFAIK, this is blocking Spark itself from upgrade the built-in version to Hive 4. Thanks Peter for recap.

Re: [DISCUSS] Hive Support

2025-01-07 Thread Denys Kuzmenko
Hi Peter, Re "Hive would provide a HMS client jar which only contains java code which is needed to connect and communicate using Thrift with a HMS instance (no internal HMS server code etc). We could use it as a dependency for our iceberg-hive-metastore module. Either setting a minimal version

Re: [DISCUSS] Hive Support

2025-01-07 Thread Péter Váry
Thanks Wing Yew, We should remove the Iceberg Hive Runtime module, but make sure that the Iceberg Hive Metastore module tests are running against the supported(?) Hive 2.3.10/3.1.3/4.0.1 versions. Other tests could run against whatever Hive version they prefer In details: -- Let me r

Re: [DISCUSS] Hive Support

2025-01-06 Thread Wing Yew Poon
FYI -- It looks like the built-in Hive version in the master branch of Apache Spark is 2.3.10 (https://issues.apache.org/jira/browse/SPARK-47018), and https://issues.apache.org/jira/browse/SPARK-44114 (upgrade built-in Hive to 3+) is an open issue. On Mon, Jan 6, 2025 at 1:07 PM Wing Yew Poon wr

Re: [DISCUSS] Hive Support

2025-01-06 Thread Wing Yew Poon
Hi Peter, In Spark, you can specify the Hive version of the metastore that you want to use. There is a configuration, spark.sql.hive.metastore.version, which currently (as of Spark 3.5) defaults to 2.3.9, and the jars supporting this default version are shipped with Spark as built-in. You can speci

Re: [DISCUSS] Hive Support

2025-01-06 Thread Péter Váry
Hi Manu, > Spark has only added hive 4.0 metastore support recently for Spark 4.0[1] and there will be conflicts Does this mean that Spark 4.0 will always use Hive 4 code? Or it will use Hive 2 when it is present on the classpath, but if older Hive versions are not on the classpath then it will u

Re: [DISCUSS] Hive Support

2025-01-04 Thread Manu Zhang
> > This basically means that we need to support every exact Hive versions > which are used by Spark, and we need to exclude our own Hive version from > the Spark runtime. Firstly, upgrading from Hive 2 to Hive 4 is a huge change, and I expect compatibility to be much better once Iceberg and Spar

Re: [DISCUSS] Hive Support

2025-01-03 Thread Péter Váry
That sounds really interesting in a bad way :) :( This basically means that we need to support every exact Hive versions which are used by Spark, and we need to exclude our own Hive version from the Spark runtime. On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote: > Hi Peter, > >> I think we should

Re: [DISCUSS] Hive Support

2024-12-18 Thread Manu Zhang
Hi Peter, > I think we should make sure that the Iceberg Hive version is independent > from the version used by Spark I'm afraid that is not how it works currently. When Spark is deployed with hive libraries (I suppose this is common), iceberg-spark runtime must be compatible with them. Otherwis

Re: [DISCUSS] Hive Support

2024-12-18 Thread Péter Váry
@Manu: What will be the end result? Do we have to use the same Hive version in Iceberg as it is defined by Spark? I think we should make sure that the Iceberg Hive version is independent from the version used by Spark On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com wrote: > > I'm not sure there's a

Re: [DISCUSS] Hive Support

2024-12-16 Thread rdb...@gmail.com
> I'm not sure there's an upgrade path before Spark 4.0. Any ideas? We can at least separate the concerns. We can remove the runtime modules that are the main issue. If we compile against an older version of the Hive metastore module (leaving it unchanged) that at least has a dramatically reduced

Re: [DISCUSS] Hive Support

2024-12-15 Thread Manu Zhang
Hi Daniel, I'll start a vote once I get the PR ready. Hi Ryan, Sorry, I wasn't clear in the last email that the consensus is to upgrade Hive metastore support. Well, I was too optimistic about the upgrade. Spark has only added hive 4.0 metastore support recently for Spark 4.0[1] and there will be

Re: [DISCUSS] Hive Support

2024-12-13 Thread rdb...@gmail.com
Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore support? When I read the thread, I thought that we weren't going to change the metastore. That seems reasonable to me. Sorry for the confusion. On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com wrote: > Sorry, I must have mi

Re: [DISCUSS] Hive Support

2024-12-13 Thread rdb...@gmail.com
Sorry, I must have missed something. I don't think that we should upgrade anything in Iceberg to Hive 4. Why not simply remove the Hive support entirely? Why would anyone need Hive 4 support from Iceberg when it is built into Hive 4? On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks wrote: > Hey Man

Re: [DISCUSS] Hive Support

2024-12-12 Thread Daniel Weeks
Hey Manu, I agree with the direction here, but we should probably hold a quick procedural vote just to confirm since this is a significant change in support for Hive. -Dan On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang wrote: > Thanks all for sharing your thoughts. It looks there's a consensus on

Re: [DISCUSS] Hive Support

2024-12-11 Thread Manu Zhang
Thanks all for sharing your thoughts. It looks there's a consensus on upgrading to Hive 4 and dropping hive-runtime. I've submitted a PR[1] as the first step. Please help review. 1. https://github.com/apache/iceberg/pull/11750 Thanks, Manu On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya wrote:

Re: [DISCUSS] Hive Support

2024-11-28 Thread Shohei Okumiya
Hi all, I also prefer option 1. I have some initiatives[1] to improve integrations between Hive and Iceberg. The current style allows us to develop both Hive's core and HiveIcebergStorageHandler simultaneously. That would help us enhance integrations. - [1] https://issues.apache.org/jira/browse/H

Re: [DISCUSS] Hive Support

2024-11-27 Thread Péter Váry
Given that the Hive folks also leaning towards keeping the hive-runtime code in the Hive repo, I think we should move forward as Cheng Pan suggested: - Upgrade to Hive 4 - Remove hive-runtime code and tests - Make sure that a nightly build is available, so Hive folks could run integration tests, an

Re: [DISCUSS] Hive Support

2024-11-27 Thread Ajantha Bhat
+1 to remove support for both Hive2 and Hive3 in the latest Iceberg release as it has reached EOL. Hive4 is natively managing Iceberg integration, similar to how Trino handles its Iceberg integration. Therefore, in my opinion, it would be better for engines to manage the integration aspect, allowi

Re: [DISCUSS] Hive Support

2024-11-27 Thread Fokko Driesprong
Hey Cheng, Thanks for the suggestion. The nightly snapshots are available: https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/, which might help when working on features that are not released yet (eg Nanosecond timestamps). Besides that, we should run RCs agains

Re: [DISCUSS] Hive Support

2024-11-27 Thread rdb...@gmail.com
I think that we should remove Hive 2 and Hive 3. We already agreed to remove Hive 2, but Hive 3 is not compatible with the project anymore and is already EOL and will not see a release to update it so that it can be compatible. Anyone using the existing Hive 3 support should be able to continue usi

Re: [DISCUSS] Hive Support

2024-11-27 Thread Cheng Pan
> That said, it would be helpful if they continue running > tests against the latest stable Hive releases to ensure that any > changes don’t unintentionally break something for Hive, which would be > beyond our control. > I believe we should continue maintaining a Hive Iceberg runtime test suite

Re: [DISCUSS] Hive Support

2024-11-27 Thread Ayush Saxena
> Let me know if the above doesn't make any sense, though! To be honest, it doesn’t. The email feels accusatory, unfairly blaming the Hive community for wrongdoing while portraying the Iceberg folks as "worse" and insinuating misconduct on their part. This kind of tone does nothing to foster conse

Re: [DISCUSS] Hive Support

2024-11-27 Thread Gabor Kaszab
time. I have seen that >> Trino repo did some Spark integration testing( >> https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java) >> . Maybe we can consider this way. >> >> >

Re: [DISCUSS] Hive Support

2024-11-26 Thread Simhadri G
trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java) > . Maybe we can consider this way. > > > > Thanks, > Butao Zhang > ---- Replied Message > From Wing Yew Poon > Date 11/26/2024 05:50 > To > Cc > Subject Re: [DI

Re: [DISCUSS] Hive Support

2024-11-25 Thread Butao Zhang
hanks, Butao Zhang Replied Message | From | Wing Yew Poon | | Date | 11/26/2024 05:50 | | To | | | Cc | | | Subject | Re: [DISCUSS] Hive Support | For the Hive runtime, would it be feasible for the Hive community to contribute a suite of tests to the Iceberg repo that can be run with dependencie

Re: [DISCUSS] Hive Support

2024-11-25 Thread Wing Yew Poon
For the Hive runtime, would it be feasible for the Hive community to contribute a suite of tests to the Iceberg repo that can be run with dependencies from the latest Hive release (Hive 4.x), and then update them from time to time as appropriate? The purpose of this suite would be to test integrati

Re: [DISCUSS] Hive Support

2024-11-25 Thread Ayush Saxena
Hi Peter, Thanks for bringing this to our attention. >From my side, I have a say only on the code that resides in the Hive repository. I am okay with the first approach, as we are already following it for the most part. Whether Iceberg keeps or drops the code shouldn’t have much impact on us. (I

Re: [DISCUSS] Hive Support

2024-11-25 Thread Péter Váry
Let's separate out the discussion of the 2 modules: - hive-metastore - we definitely need the implementation and the tests here, as we want to be able to progress with features like views without waiting for a Hive release. So we need to move forward to Hive 4 now, and keep the code in place - hive

Re: [DISCUSS] Hive Support

2024-11-22 Thread Manu Zhang
Hi Peter and Fokko, What about Cheng Pan's point that there will be duplicated implementations in Hive and Iceberg if we upgrade iceberg-hive3 to iceberg-hive4? On Fri, Nov 22, 2024 at 5:18 PM Fokko Driesprong wrote: > I agree with Péter, that sounds like the right approach to me as well. > > K

Re: [DISCUSS] Hive Support

2024-11-22 Thread Fokko Driesprong
I agree with Péter, that sounds like the right approach to me as well. Kind regards, Fokko Op vr 22 nov 2024 om 07:38 schreef Péter Váry : > I would prefer B, and only revert to A if we find that B becomes too > complicated. > > On Fri, Nov 22, 2024, 04:26 Manu Zhang wrote: > >> Hi Peter, >> >>

Re: [DISCUSS] Hive Support

2024-11-21 Thread Péter Váry
I would prefer B, and only revert to A if we find that B becomes too complicated. On Fri, Nov 22, 2024, 04:26 Manu Zhang wrote: > Hi Peter, > > Would you be more specific on which option above do you prefer? > > Thanks, > Manu > > On Thu, Nov 21, 2024 at 10:07 PM Péter Váry > wrote: > >> Hi Tea

Re: [DISCUSS] Hive Support

2024-11-21 Thread Manu Zhang
Hi Peter, Would you be more specific on which option above do you prefer? Thanks, Manu On Thu, Nov 21, 2024 at 10:07 PM Péter Váry wrote: > Hi Team, > > Just to clarify. Hive 3 officially doesn't support Java 11, and there are > no plans to release a new Hive 3 version with support. > By "acci

Re: [DISCUSS] Hive Support

2024-11-21 Thread Péter Váry
Hi Team, Just to clarify. Hive 3 officially doesn't support Java 11, and there are no plans to release a new Hive 3 version with support. By "accident" the Hive Metastore tests are running with Hive 3 with Java 11, but the Hive runtime tests are not running (Starting the HiveServer fails, so no te

Re: [DISCUSS] Hive Support

2024-11-21 Thread Jean-Baptiste Onofré
Hi Manu It sounds like a plan. I think it makes sense to drop Hive 2 & 3 and encourage use of Hive 4 (mostly documentation task). Regards JB On Wed, Nov 20, 2024 at 7:19 AM Manu Zhang wrote: > > Okay, let me add this option > > D. Drop Hive 2 & 3 support and suggest to use built-in Iceberg supp

Re: [DISCUSS] Hive Support

2024-11-20 Thread Manu Zhang
> > It is my understanding that removing the hive-metastore module is NOT > under consideration; is that correct? > Correct, the modules under discussion here are specifically iceberg-mr and iceberg-hive3. I don't see any problems for other modules to upgrade to Hive 3. On Thu, Nov 21, 2024 at 4:

Re: [DISCUSS] Hive Support

2024-11-20 Thread Wing Yew Poon
Also to clarify -- It is my understanding that removing the hive-metastore module is NOT under consideration; is that correct? We still need a Hive version to depend on for the hive-metastore module. In https://github.com/apache/iceberg/pull/10996, this is Hive 3. Does this present any problem? O

Re: [DISCUSS] Hive Support

2024-11-20 Thread Manu Zhang
To clarify, the changes discussed here don't affect hive connectors in engines, which either use the built-in hive version (Spark) or can be upgraded to hive 3 (Flink). On Wed, Nov 20, 2024 at 2:19 PM Manu Zhang wrote: > Okay, let me add this option > > D. Drop Hive 2 & 3 support and suggest to

Re: [DISCUSS] Hive Support

2024-11-19 Thread Manu Zhang
Okay, let me add this option D. Drop Hive 2 & 3 support and suggest to use built-in Iceberg support of Hive 4 On Wed, Nov 20, 2024 at 2:00 PM Cheng Pan wrote: > Hive 4 brings built-in support for Iceberg format, duplicated > implementation in both sides look a redundant stuff. > > As Hive 2 and

Re: [DISCUSS] Hive Support

2024-11-19 Thread Cheng Pan
Hive 4 brings built-in support for Iceberg format, duplicated implementation in both sides look a redundant stuff. As Hive 2 and 3 do not support Java 11+, and Iceberg 1.8 requires Java 11+, the combination is invalid. How about simply dropping support for Hive 2&3 and suggesting the Hive user

[DISCUSS] Hive Support

2024-11-19 Thread Manu Zhang
Hi all, We previously reached consensus[1] to deprecate Hive 2 in 1.7 and drop in 1.8. However, when working on the removal PR[2], multiple tests failed in Hive 3 due to not supporting JDK11[3]. The fix has been back-ported to branch-3.1[4] but not released yet. As announced on Hive website, Hive