Re: [DISCUSS] Hive Support

Butao Zhang Mon, 25 Nov 2024 18:20:40 -0800

Hi folks,


             Firstly Thanks Peter for bringing it up!  I also think option 1 is 
a more reasonable solution right now, as we have developed lots of advanced 
iceberg features in hive repo, such as mor & cow & compaction, etc, and these 
feats are coupled with Hive core code base. Hive runtime/connector in iceberg 
repo can not easily make this advanced feats happen. So in the long term, drop 
Hive runtime from iceberg repo and maintain it in Hive repo is more sensible.


             BTW, i have did some work about upgrading iceberg in Hive repo, 
like HIVE-28495. We often backport some hive-iceberg related commits from 
Iceberg repo to Hive repo.  What i noticed that the iceberg-catalog in Hive 
repo (equals to hive-metastore in Iceberg repo)  rarely changes. As Denys said 
above we could potentially drop it from Hive repo and maybe rename to 
`hive-catalog` in iceberg.  I think it makes more sense to keep the hive 
catalog in one place.  But i am not sure if the hive catalog will be coupled 
with hive core codes when developing some Upcoming advanced features. If being 
coupled with hive core codes, it's better to stay in Hive repo.  Some folks who 
know more about catalogs can give more context.


          About the hive(Hive 4) test integration in iceberg repo, in general, 
i think we can keep some basic Hive4 tests in iceberg repo, as this not only 
makes iceberg core more stable, but also ensures that Hive4's  iceberg runtime 
will not be damaged at time. I have seen that Trino repo did some Spark 
integration 
testing(https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java)
 . Maybe we can consider this way.
            




Thanks,
Butao Zhang
---- Replied Message ----
| From | Wing Yew Poon<wyp...@cloudera.com.INVALID> |
| Date | 11/26/2024 05:50 |
| To | <dev@iceberg.apache.org> |
| Cc | <d...@hive.apache.org> |
| Subject | Re: [DISCUSS] Hive Support |
For the Hive runtime, would it be feasible for the Hive community to contribute 
a suite of tests to the Iceberg repo that can be run with dependencies from the 
latest Hive release (Hive 4.x), and then update them from time to time as 
appropriate? The purpose of this suite would be to test integration of Iceberg 
core with the Hive runtime. Perhaps the existing tests in the mr and hive3 
modules could be a starting point, or you might decide on different tests 
altogether.
The development of the Hive runtime would then continue as now in the Hive 
repo, but you gain better assurance of compatibility with ongoing Iceberg 
development, with a relatively small maintenance burden in Iceberg.






On Mon, Nov 25, 2024 at 11:56 AM Ayush Saxena <ayush...@gmail.com> wrote:

Hi Peter,

Thanks for bringing this to our attention.

From my side, I have a say only on the code that resides in the Hive
repository. I am okay with the first approach, as we are already
following it for the most part. Whether Iceberg keeps or drops the
code shouldn’t have much impact on us. (I don't think I have a say on
that either) That said, it would be helpful if they continue running
tests against the latest stable Hive releases to ensure that any
changes don’t unintentionally break something for Hive, which would be
beyond our control.

Regarding having a separate code repository for the connectors, I
believe the challenges would outweigh the benefits. As mentioned, the
initial workload would be significant, but more importantly,
maintaining a regular cadence of releases would be even more
difficult. I don’t see a large pool of contributors specifically
focused on this area who could take ownership and drive releases for a
single repository. Additionally, the ASF doesn’t officially allow
repo-level committers or PMC members who could be recruited solely to
manage one repository. Given these constraints, I suggest dropping
this idea for now.

Best,
Ayush

On Tue, 26 Nov 2024 at 01:05, Denys Kuzmenko <dkuzme...@apache.org> wrote:
>
> Hi Peter,
>
> Thanks for bringing it up!
>
> I think that option 1 is the only viable solution here (remove the 
> hive-runtime from the iceberg repo). Main reason: lack of reviewers for 
> things other than Spark.
>
> Note: need to double check, but I am pretty sure there is no difference 
> between Hive `iceberg-catalog` and iceberg's `hive-metastore`, so we could 
> potentially drop it from Hive repo and maybe rename to `hive-catalog` in 
> iceberg?
>
> Supporting one more connector repo seems like an overhead: need to setup 
> infra, CI, have active contributors/release managers. Later probably is the 
> reason why we still haven't moved HMS into a separate repo.
>
> Having iceberg connector in Hive gives us more flexibility and ownership of 
> that component, doesn't block an active development.
> We try to be up-to-date with latest iceberg, but it usually takes some time.
>
> I'd be glad to hear other opinions.
>
> Thanks,
> Denys

Re: [DISCUSS] Hive Support

Reply via email to