Hi folks,
Firstly Thanks Peter for bringing it up! I also think option 1 is
a more reasonable solution right now, as we have developed lots of advanced
iceberg features in hive repo, such as mor & cow & compaction, etc, and these
feats are coupled with Hive core code base. Hive runtime/connector in iceberg
repo can not easily make this advanced feats happen. So in the long term, drop
Hive runtime from iceberg repo and maintain it in Hive repo is more sensible.
BTW, i have did some work about upgrading iceberg in Hive repo,
like HIVE-28495. We often backport some hive-iceberg related commits from
Iceberg repo to Hive repo. What i noticed that the iceberg-catalog in Hive
repo (equals to hive-metastore in Iceberg repo) rarely changes. As Denys said
above we could potentially drop it from Hive repo and maybe rename to
`hive-catalog` in iceberg. I think it makes more sense to keep the hive
catalog in one place. But i am not sure if the hive catalog will be coupled
with hive core codes when developing some Upcoming advanced features. If being
coupled with hive core codes, it's better to stay in Hive repo. Some folks who
know more about catalogs can give more context.
About the hive(Hive 4) test integration in iceberg repo, in general,
i think we can keep some basic Hive4 tests in iceberg repo, as this not only
makes iceberg core more stable, but also ensures that Hive4's iceberg runtime
will not be damaged at time. I have seen that Trino repo did some Spark
integration
testing(https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java)
. Maybe we can consider this way.
Thanks,
Butao Zhang
---- Replied Message ----
| From | Wing Yew Poon<[email protected]> |
| Date | 11/26/2024 05:50 |
| To | <[email protected]> |
| Cc | <[email protected]> |
| Subject | Re: [DISCUSS] Hive Support |
For the Hive runtime, would it be feasible for the Hive community to contribute
a suite of tests to the Iceberg repo that can be run with dependencies from the
latest Hive release (Hive 4.x), and then update them from time to time as
appropriate? The purpose of this suite would be to test integration of Iceberg
core with the Hive runtime. Perhaps the existing tests in the mr and hive3
modules could be a starting point, or you might decide on different tests
altogether.
The development of the Hive runtime would then continue as now in the Hive
repo, but you gain better assurance of compatibility with ongoing Iceberg
development, with a relatively small maintenance burden in Iceberg.
On Mon, Nov 25, 2024 at 11:56 AM Ayush Saxena <[email protected]> wrote:
Hi Peter,
Thanks for bringing this to our attention.
From my side, I have a say only on the code that resides in the Hive
repository. I am okay with the first approach, as we are already
following it for the most part. Whether Iceberg keeps or drops the
code shouldn’t have much impact on us. (I don't think I have a say on
that either) That said, it would be helpful if they continue running
tests against the latest stable Hive releases to ensure that any
changes don’t unintentionally break something for Hive, which would be
beyond our control.
Regarding having a separate code repository for the connectors, I
believe the challenges would outweigh the benefits. As mentioned, the
initial workload would be significant, but more importantly,
maintaining a regular cadence of releases would be even more
difficult. I don’t see a large pool of contributors specifically
focused on this area who could take ownership and drive releases for a
single repository. Additionally, the ASF doesn’t officially allow
repo-level committers or PMC members who could be recruited solely to
manage one repository. Given these constraints, I suggest dropping
this idea for now.
Best,
Ayush
On Tue, 26 Nov 2024 at 01:05, Denys Kuzmenko <[email protected]> wrote:
>
> Hi Peter,
>
> Thanks for bringing it up!
>
> I think that option 1 is the only viable solution here (remove the
> hive-runtime from the iceberg repo). Main reason: lack of reviewers for
> things other than Spark.
>
> Note: need to double check, but I am pretty sure there is no difference
> between Hive `iceberg-catalog` and iceberg's `hive-metastore`, so we could
> potentially drop it from Hive repo and maybe rename to `hive-catalog` in
> iceberg?
>
> Supporting one more connector repo seems like an overhead: need to setup
> infra, CI, have active contributors/release managers. Later probably is the
> reason why we still haven't moved HMS into a separate repo.
>
> Having iceberg connector in Hive gives us more flexibility and ownership of
> that component, doesn't block an active development.
> We try to be up-to-date with latest iceberg, but it usually takes some time.
>
> I'd be glad to hear other opinions.
>
> Thanks,
> Denys