Thanks Wing Yew,
We should remove the Iceberg Hive Runtime module, but make sure that the
Iceberg Hive Metastore module tests are running against the supported(?)
Hive 2.3.10/3.1.3/4.0.1 versions. Other tests could run against
whatever Hive version they prefer
In details:
--
Let me r
FYI --
It looks like the built-in Hive version in the master branch of Apache
Spark is 2.3.10 (https://issues.apache.org/jira/browse/SPARK-47018), and
https://issues.apache.org/jira/browse/SPARK-44114 (upgrade built-in Hive to
3+) is an open issue.
On Mon, Jan 6, 2025 at 1:07 PM Wing Yew Poon wr
Hi Peter,
In Spark, you can specify the Hive version of the metastore that you want
to use. There is a configuration, spark.sql.hive.metastore.version, which
currently (as of Spark 3.5) defaults to 2.3.9, and the jars supporting this
default version are shipped with Spark as built-in. You can speci
Hi Manu,
> Spark has only added hive 4.0 metastore support recently for Spark 4.0[1]
and there will be conflicts
Does this mean that Spark 4.0 will always use Hive 4 code? Or it will use
Hive 2 when it is present on the classpath, but if older Hive versions are
not on the classpath then it will u
>
> This basically means that we need to support every exact Hive versions
> which are used by Spark, and we need to exclude our own Hive version from
> the Spark runtime.
Firstly, upgrading from Hive 2 to Hive 4 is a huge change, and I expect
compatibility to be much better once Iceberg and Spar
That sounds really interesting in a bad way :) :(
This basically means that we need to support every exact Hive versions
which are used by Spark, and we need to exclude our own Hive version from
the Spark runtime.
On Thu, Dec 19, 2024, 04:00 Manu Zhang wrote:
> Hi Peter,
>
>> I think we should
Hi Peter,
> I think we should make sure that the Iceberg Hive version is independent
> from the version used by Spark
I'm afraid that is not how it works currently. When Spark is deployed with
hive libraries (I suppose this is common), iceberg-spark runtime must be
compatible with them.
Otherwis
@Manu: What will be the end result? Do we have to use the same Hive version
in Iceberg as it is defined by Spark? I think we should make sure that the
Iceberg Hive version is independent from the version used by Spark
On Mon, Dec 16, 2024, 21:58 rdb...@gmail.com wrote:
> > I'm not sure there's a
> I'm not sure there's an upgrade path before Spark 4.0. Any ideas?
We can at least separate the concerns. We can remove the runtime modules
that are the main issue. If we compile against an older version of the Hive
metastore module (leaving it unchanged) that at least has a dramatically
reduced
Hi Daniel,
I'll start a vote once I get the PR ready.
Hi Ryan,
Sorry, I wasn't clear in the last email that the consensus is to upgrade
Hive metastore support.
Well, I was too optimistic about the upgrade. Spark has only added hive 4.0
metastore support recently for Spark 4.0[1] and there will be
Oh, I think I see. The upgrade to Hive 4 is just for the Hive metastore
support? When I read the thread, I thought that we weren't going to change
the metastore. That seems reasonable to me. Sorry for the confusion.
On Fri, Dec 13, 2024 at 10:24 AM rdb...@gmail.com wrote:
> Sorry, I must have mi
Sorry, I must have missed something. I don't think that we should upgrade
anything in Iceberg to Hive 4. Why not simply remove the Hive support
entirely? Why would anyone need Hive 4 support from Iceberg when it is
built into Hive 4?
On Thu, Dec 12, 2024 at 11:03 AM Daniel Weeks wrote:
> Hey Man
Hey Manu,
I agree with the direction here, but we should probably hold a quick
procedural vote just to confirm since this is a significant change in
support for Hive.
-Dan
On Wed, Dec 11, 2024 at 5:19 PM Manu Zhang wrote:
> Thanks all for sharing your thoughts. It looks there's a consensus on
Thanks all for sharing your thoughts. It looks there's a consensus on
upgrading to Hive 4 and dropping hive-runtime.
I've submitted a PR[1] as the first step. Please help review.
1. https://github.com/apache/iceberg/pull/11750
Thanks,
Manu
On Thu, Nov 28, 2024 at 11:26 PM Shohei Okumiya wrote:
Hi all,
I also prefer option 1. I have some initiatives[1] to improve
integrations between Hive and Iceberg. The current style allows us to
develop both Hive's core and HiveIcebergStorageHandler simultaneously.
That would help us enhance integrations.
- [1] https://issues.apache.org/jira/browse/H
Hey Cheng,
Thanks for the suggestion. The nightly snapshots are available:
https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/,
which might help when working on features that are not released yet (eg
Nanosecond timestamps). Besides that, we should run RCs agains
I think that we should remove Hive 2 and Hive 3. We already agreed to
remove Hive 2, but Hive 3 is not compatible with the project anymore and is
already EOL and will not see a release to update it so that it can be
compatible. Anyone using the existing Hive 3 support should be able to
continue usi
> That said, it would be helpful if they continue running
> tests against the latest stable Hive releases to ensure that any
> changes don’t unintentionally break something for Hive, which would be
> beyond our control.
> I believe we should continue maintaining a Hive Iceberg runtime test suite
> Let me know if the above doesn't make any sense, though!
To be honest, it doesn’t. The email feels accusatory, unfairly blaming
the Hive community for wrongdoing while portraying the Iceberg folks
as "worse" and insinuating misconduct on their part. This kind of tone
does nothing to foster conse
> Let me know if the above doesn't make any sense, though!
To be honest, it doesn’t. The email feels accusatory, unfairly blaming
the Hive community for wrongdoing while portraying the Iceberg folks
as "worse" and insinuating misconduct on their part. This kind of tone
does nothing to foster conse
Hi Gabor,
It's a bit odd to get the following feedback from the Impala folks:
"I'd like to understand the motivation why this whole replication of code
happened between Iceberg and Hive."
when you know exactly why.
FYI, we've raised our concerns multiple times to the iceberg community, for
ex
time. I have seen that
>> Trino repo did some Spark integration testing(
>> https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java)
>> . Maybe we can consider this way.
>>
>>
>
trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java)
> . Maybe we can consider this way.
>
>
>
> Thanks,
> Butao Zhang
> ---- Replied Message
> From Wing Yew Poon
> Date 11/26/2024 05:50
> To
> Cc
> Subject Re: [DI
hanks,
Butao Zhang
Replied Message
| From | Wing Yew Poon |
| Date | 11/26/2024 05:50 |
| To | |
| Cc | |
| Subject | Re: [DISCUSS] Hive Support |
For the Hive runtime, would it be feasible for the Hive community to contribute
a suite of tests to the Iceberg repo that can be run with dependencie
For the Hive runtime, would it be feasible for the Hive community to
contribute a suite of tests to the Iceberg repo that can be run with
dependencies from the latest Hive release (Hive 4.x), and then update them
from time to time as appropriate? The purpose of this suite would be to
test integrati
Hi Peter,
Thanks for bringing this to our attention.
>From my side, I have a say only on the code that resides in the Hive
repository. I am okay with the first approach, as we are already
following it for the most part. Whether Iceberg keeps or drops the
code shouldn’t have much impact on us. (I
Hi Peter,
Thanks for bringing it up!
I think that option 1 is the only viable solution here (remove the hive-runtime
from the iceberg repo). Main reason: lack of reviewers for things other than
Spark.
Note: need to double check, but I am pretty sure there is no difference between
Hive `iceb
Let's separate out the discussion of the 2 modules:
- hive-metastore - we definitely need the implementation and the tests
here, as we want to be able to progress with features like views without
waiting for a Hive release. So we need to move forward to Hive 4 now, and
keep the code in place
- hive
28 matches
Mail list logo