Re: [DISCUSS] Hive Support

2025-01-07 Thread Péter Váry
Thanks Wing Yew, We should remove the Iceberg Hive Runtime module, but make sure that the Iceberg Hive Metastore module tests are running against the supported(?) Hive 2.3.10/3.1.3/4.0.1 versions. Other tests could run against whatever Hive version they prefer In details: -- Let me r

Re: [DISCUSS] Hive Support

2025-01-06 Thread Wing Yew Poon
FYI -- It looks like the built-in Hive version in the master branch of Apache Spark is 2.3.10 (https://issues.apache.org/jira/browse/SPARK-47018), and https://issues.apache.org/jira/browse/SPARK-44114 (upgrade built-in Hive to 3+) is an open issue. On Mon, Jan 6, 2025 at 1:07 PM Wing Yew Poon wr

Re: [DISCUSS] Hive Support

2025-01-06 Thread Wing Yew Poon
Hi Peter, In Spark, you can specify the Hive version of the metastore that you want to use. There is a configuration, spark.sql.hive.metastore.version, which currently (as of Spark 3.5) defaults to 2.3.9, and the jars supporting this default version are shipped with Spark as built-in. You can speci

Re: [DISCUSS] Hive Support

2025-01-06 Thread Péter Váry
Hi Manu, > Spark has only added hive 4.0 metastore support recently for Spark 4.0[1] and there will be conflicts Does this mean that Spark 4.0 will always use Hive 4 code? Or it will use Hive 2 when it is present on the classpath, but if older Hive versions are not on the classpath then it will u

Re: [DISCUSS] Hive Support

2025-01-04 Thread Manu Zhang
> > This basically means that we need to support every exact Hive versions > which are used by Spark, and we need to exclude our own Hive version from > the Spark runtime. Firstly, upgrading from Hive 2 to Hive 4 is a huge change, and I expect compatibility to be much better once Iceberg and Spar

Re: [DISCUSS] Hive Support

2025-01-03 Thread Péter Váry
That sounds really interesting in a bad way :) :( This basically means that we need to support every exact Hive versions which are used by Spark, and we need to exclude our own Hive version from the Spark runtime. On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote: > Hi Peter, > >> I think we should

Re: [DISCUSS] Hive Support

2024-12-18 Thread Manu Zhang
Hi Peter, > I think we should make sure that the Iceberg Hive version is independent > from the version used by Spark I'm afraid that is not how it works currently. When Spark is deployed with hive libraries (I suppose this is common), iceberg-spark runtime must be compatible with them. Otherwis

Re: [DISCUSS] Hive Support

2024-12-18 Thread Péter Váry
@Manu: What will be the end result? Do we have to use the same Hive version in Iceberg as it is defined by Spark? I think we should make sure that the Iceberg Hive version is independent from the version used by Spark On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com wrote: > > I'm not sure there's a

Re: [DISCUSS] Hive Support

2024-12-16 Thread rdb...@gmail.com
> I'm not sure there's an upgrade path before Spark 4.0. Any ideas? We can at least separate the concerns. We can remove the runtime modules that are the main issue. If we compile against an older version of the Hive metastore module (leaving it unchanged) that at least has a dramatically reduced

Re: [DISCUSS] Hive Support

2024-12-15 Thread Manu Zhang
Hi Daniel, I'll start a vote once I get the PR ready. Hi Ryan, Sorry, I wasn't clear in the last email that the consensus is to upgrade Hive metastore support. Well, I was too optimistic about the upgrade. Spark has only added hive 4.0 metastore support recently for Spark 4.0[1] and there will be

Re: [DISCUSS] Hive Support

2024-12-13 Thread rdb...@gmail.com
Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore support? When I read the thread, I thought that we weren't going to change the metastore. That seems reasonable to me. Sorry for the confusion. On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com wrote: > Sorry, I must have mi

Re: [DISCUSS] Hive Support

2024-12-13 Thread rdb...@gmail.com
Sorry, I must have missed something. I don't think that we should upgrade anything in Iceberg to Hive 4. Why not simply remove the Hive support entirely? Why would anyone need Hive 4 support from Iceberg when it is built into Hive 4? On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks wrote: > Hey Man

Re: [DISCUSS] Hive Support

2024-12-12 Thread Daniel Weeks
Hey Manu, I agree with the direction here, but we should probably hold a quick procedural vote just to confirm since this is a significant change in support for Hive. -Dan On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang wrote: > Thanks all for sharing your thoughts. It looks there's a consensus on

Re: [DISCUSS] Hive Support

2024-12-11 Thread Manu Zhang
Thanks all for sharing your thoughts. It looks there's a consensus on upgrading to Hive 4 and dropping hive-runtime. I've submitted a PR[1] as the first step. Please help review. 1. https://github.com/apache/iceberg/pull/11750 Thanks, Manu On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya wrote:

Re: [DISCUSS] Hive Support

2024-11-28 Thread Shohei Okumiya
Hi all, I also prefer option 1. I have some initiatives[1] to improve integrations between Hive and Iceberg. The current style allows us to develop both Hive's core and HiveIcebergStorageHandler simultaneously. That would help us enhance integrations. - [1] https://issues.apache.org/jira/browse/H

Re: [DISCUSS] Hive Support

2024-11-27 Thread Fokko Driesprong
Hey Cheng, Thanks for the suggestion. The nightly snapshots are available: https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/, which might help when working on features that are not released yet (eg Nanosecond timestamps). Besides that, we should run RCs agains

Re: [DISCUSS] Hive Support

2024-11-27 Thread rdb...@gmail.com
I think that we should remove Hive 2 and Hive 3. We already agreed to remove Hive 2, but Hive 3 is not compatible with the project anymore and is already EOL and will not see a release to update it so that it can be compatible. Anyone using the existing Hive 3 support should be able to continue usi

Re: [DISCUSS] Hive Support

2024-11-27 Thread Cheng Pan
> That said, it would be helpful if they continue running > tests against the latest stable Hive releases to ensure that any > changes don’t unintentionally break something for Hive, which would be > beyond our control. > I believe we should continue maintaining a Hive Iceberg runtime test suite

Re: [DISCUSS] Hive Support

2024-11-27 Thread Ayush Saxena
> Let me know if the above doesn't make any sense, though! To be honest, it doesn’t. The email feels accusatory, unfairly blaming the Hive community for wrongdoing while portraying the Iceberg folks as "worse" and insinuating misconduct on their part. This kind of tone does nothing to foster conse

Re: [DISCUSS] Hive Support

2024-11-27 Thread Ayush Saxena
> Let me know if the above doesn't make any sense, though! To be honest, it doesn’t. The email feels accusatory, unfairly blaming the Hive community for wrongdoing while portraying the Iceberg folks as "worse" and insinuating misconduct on their part. This kind of tone does nothing to foster conse

Re: [DISCUSS] Hive Support

2024-11-27 Thread Denys Kuzmenko
Hi Gabor, It's a bit odd to get the following feedback from the Impala folks: "I'd like to understand the motivation why this whole replication of code happened between Iceberg and Hive." when you know exactly why. FYI, we've raised our concerns multiple times to the iceberg community, for ex

Re: [DISCUSS] Hive Support

2024-11-27 Thread Gabor Kaszab
time. I have seen that >> Trino repo did some Spark integration testing( >> https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java) >> . Maybe we can consider this way. >> >> >

Re: [DISCUSS] Hive Support

2024-11-26 Thread Simhadri G
trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java) > . Maybe we can consider this way. > > > > Thanks, > Butao Zhang > ---- Replied Message > From Wing Yew Poon > Date 11/26/2024 05:50 > To > Cc > Subject Re: [DI

Re: [DISCUSS] Hive Support

2024-11-25 Thread Butao Zhang
hanks, Butao Zhang Replied Message | From | Wing Yew Poon | | Date | 11/26/2024 05:50 | | To | | | Cc | | | Subject | Re: [DISCUSS] Hive Support | For the Hive runtime, would it be feasible for the Hive community to contribute a suite of tests to the Iceberg repo that can be run with dependencie

Re: [DISCUSS] Hive Support

2024-11-25 Thread Wing Yew Poon
For the Hive runtime, would it be feasible for the Hive community to contribute a suite of tests to the Iceberg repo that can be run with dependencies from the latest Hive release (Hive 4.x), and then update them from time to time as appropriate? The purpose of this suite would be to test integrati

Re: [DISCUSS] Hive Support

2024-11-25 Thread Ayush Saxena
Hi Peter, Thanks for bringing this to our attention. >From my side, I have a say only on the code that resides in the Hive repository. I am okay with the first approach, as we are already following it for the most part. Whether Iceberg keeps or drops the code shouldn’t have much impact on us. (I

Re: [DISCUSS] Hive Support

2024-11-25 Thread Denys Kuzmenko
Hi Peter, Thanks for bringing it up! I think that option 1 is the only viable solution here (remove the hive-runtime from the iceberg repo). Main reason: lack of reviewers for things other than Spark. Note: need to double check, but I am pretty sure there is no difference between Hive `iceb

Re: [DISCUSS] Hive Support

2024-11-25 Thread Péter Váry
Let's separate out the discussion of the 2 modules: - hive-metastore - we definitely need the implementation and the tests here, as we want to be able to progress with features like views without waiting for a Hive release. So we need to move forward to Hive 4 now, and keep the code in place - hive