Hi folks,
Firstly Thanks Peter for bringing it up! I also think option 1 is a more reasonable solution right now, as we have developed lots of advanced iceberg features in hive repo, such as mor & cow & compaction, etc, and these feats are coupled with Hive core code base. Hive runtime/connector in iceberg repo can not easily make this advanced feats happen. So in the long term, drop Hive runtime from iceberg repo and maintain it in Hive repo is more sensible. BTW, i have did some work about upgrading iceberg in Hive repo, like HIVE-28495. We often backport some hive-iceberg related commits from Iceberg repo to Hive repo. What i noticed that the iceberg-catalog in Hive repo (equals to hive-metastore in Iceberg repo) rarely changes. As Denys said above we could potentially drop it from Hive repo and maybe rename to `hive-catalog` in iceberg. I think it makes more sense to keep the hive catalog in one place. But i am not sure if the hive catalog will be coupled with hive core codes when developing some Upcoming advanced features. If being coupled with hive core codes, it's better to stay in Hive repo. Some folks who know more about catalogs can give more context. About the hive(Hive 4) test integration in iceberg repo, in general, i think we can keep some basic Hive4 tests in iceberg repo, as this not only makes iceberg core more stable, but also ensures that Hive4's iceberg runtime will not be damaged at time. I have seen that Trino repo did some Spark integration testing(https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java) . Maybe we can consider this way. Thanks, Butao Zhang ---- Replied Message ---- | From | Wing Yew Poon<wyp...@cloudera.com.INVALID> | | Date | 11/26/2024 05:50 | | To | <dev@iceberg.apache.org> | | Cc | <d...@hive.apache.org> | | Subject | Re: [DISCUSS] Hive Support | For the Hive runtime, would it be feasible for the Hive community to contribute a suite of tests to the Iceberg repo that can be run with dependencies from the latest Hive release (Hive 4.x), and then update them from time to time as appropriate? The purpose of this suite would be to test integration of Iceberg core with the Hive runtime. Perhaps the existing tests in the mr and hive3 modules could be a starting point, or you might decide on different tests altogether. The development of the Hive runtime would then continue as now in the Hive repo, but you gain better assurance of compatibility with ongoing Iceberg development, with a relatively small maintenance burden in Iceberg. On Mon, Nov 25, 2024 at 11:56 AM Ayush Saxena <ayush...@gmail.com> wrote: Hi Peter, Thanks for bringing this to our attention. From my side, I have a say only on the code that resides in the Hive repository. I am okay with the first approach, as we are already following it for the most part. Whether Iceberg keeps or drops the code shouldn’t have much impact on us. (I don't think I have a say on that either) That said, it would be helpful if they continue running tests against the latest stable Hive releases to ensure that any changes don’t unintentionally break something for Hive, which would be beyond our control. Regarding having a separate code repository for the connectors, I believe the challenges would outweigh the benefits. As mentioned, the initial workload would be significant, but more importantly, maintaining a regular cadence of releases would be even more difficult. I don’t see a large pool of contributors specifically focused on this area who could take ownership and drive releases for a single repository. Additionally, the ASF doesn’t officially allow repo-level committers or PMC members who could be recruited solely to manage one repository. Given these constraints, I suggest dropping this idea for now. Best, Ayush On Tue, 26 Nov 2024 at 01:05, Denys Kuzmenko <dkuzme...@apache.org> wrote: > > Hi Peter, > > Thanks for bringing it up! > > I think that option 1 is the only viable solution here (remove the > hive-runtime from the iceberg repo). Main reason: lack of reviewers for > things other than Spark. > > Note: need to double check, but I am pretty sure there is no difference > between Hive `iceberg-catalog` and iceberg's `hive-metastore`, so we could > potentially drop it from Hive repo and maybe rename to `hive-catalog` in > iceberg? > > Supporting one more connector repo seems like an overhead: need to setup > infra, CI, have active contributors/release managers. Later probably is the > reason why we still haven't moved HMS into a separate repo. > > Having iceberg connector in Hive gives us more flexibility and ownership of > that component, doesn't block an active development. > We try to be up-to-date with latest iceberg, but it usually takes some time. > > I'd be glad to hear other opinions. > > Thanks, > Denys