Hi Everyone, Thank you, Peter, for the discussion!
I’m also leaning toward option one. However, given that Apache Iceberg is designed to be engine-agnostic, I believe we should continue maintaining a Hive Iceberg runtime test suite with the latest version of Hive in the Iceberg repository. This will help identify any changes that could break Hive compatibility early on. So I agree with ayush, denys and Butao on option . I think Options 2 and 3 would be difficult , as they would require a significant amount of time and effort from the community to maintain. Thanks, Simhadri G On Tue, Nov 26, 2024, 7:50 AM Butao Zhang <butaozha...@163.com> wrote: > Hi folks, > > Firstly Thanks Peter for bringing it up! I also think option > 1 is a more reasonable solution right now, as we have developed lots of > advanced iceberg features in hive repo, such as mor & cow & compaction, > etc, and these feats are coupled with Hive core code base. Hive > runtime/connector in iceberg repo can not easily make this advanced feats > happen. So in the long term, drop Hive runtime from iceberg repo and > maintain it in Hive repo is more sensible. > > BTW, i have did some work about upgrading iceberg in Hive > repo, like HIVE-28495. We often backport some hive-iceberg related commits > from Iceberg repo to Hive repo. What i noticed that the iceberg-catalog in > Hive repo (equals to hive-metastore in Iceberg repo) rarely changes. As > Denys said above *we could potentially drop it from Hive repo and maybe > rename to `hive-catalog` in iceberg. *I think it makes more sense to > keep the hive catalog in one place. But i am not sure if the hive catalog > will be coupled with hive core codes when developing some Upcoming advanced > features. If being coupled with hive core codes, it's better to stay in > Hive repo. Some folks who know more about catalogs can give more context. > > About the hive(Hive 4) test integration in iceberg repo, in > general, i think we can keep some basic Hive4 tests in iceberg repo, as > this not only makes iceberg core more stable, but also ensures that > Hive4's iceberg runtime will not be damaged at time. I have seen that > Trino repo did some Spark integration testing( > https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java) > . Maybe we can consider this way. > > > > Thanks, > Butao Zhang > ---- Replied Message ---- > From Wing Yew Poon<wyp...@cloudera.com.INVALID> <undefined> > Date 11/26/2024 05:50 > To <dev@iceberg.apache.org> <dev@iceberg.apache.org> > Cc <d...@hive.apache.org> <d...@hive.apache.org> > Subject Re: [DISCUSS] Hive Support > For the Hive runtime, would it be feasible for the Hive community to > contribute a suite of tests to the Iceberg repo that can be run with > dependencies from the latest Hive release (Hive 4.x), and then update them > from time to time as appropriate? The purpose of this suite would be to > test integration of Iceberg core with the Hive runtime. Perhaps the > existing tests in the mr and hive3 modules could be a starting point, or > you might decide on different tests altogether. > The development of the Hive runtime would then continue as now in the Hive > repo, but you gain better assurance of compatibility with ongoing Iceberg > development, with a relatively small maintenance burden in Iceberg. > > > > On Mon, Nov 25, 2024 at 11:56 AM Ayush Saxena <ayush...@gmail.com> wrote: > >> Hi Peter, >> >> Thanks for bringing this to our attention. >> >> From my side, I have a say only on the code that resides in the Hive >> repository. I am okay with the first approach, as we are already >> following it for the most part. Whether Iceberg keeps or drops the >> code shouldn’t have much impact on us. (I don't think I have a say on >> that either) That said, it would be helpful if they continue running >> tests against the latest stable Hive releases to ensure that any >> changes don’t unintentionally break something for Hive, which would be >> beyond our control. >> >> Regarding having a separate code repository for the connectors, I >> believe the challenges would outweigh the benefits. As mentioned, the >> initial workload would be significant, but more importantly, >> maintaining a regular cadence of releases would be even more >> difficult. I don’t see a large pool of contributors specifically >> focused on this area who could take ownership and drive releases for a >> single repository. Additionally, the ASF doesn’t officially allow >> repo-level committers or PMC members who could be recruited solely to >> manage one repository. Given these constraints, I suggest dropping >> this idea for now. >> >> Best, >> Ayush >> >> On Tue, 26 Nov 2024 at 01:05, Denys Kuzmenko <dkuzme...@apache.org> >> wrote: >> > >> > Hi Peter, >> > >> > Thanks for bringing it up! >> > >> > I think that option 1 is the only viable solution here (remove the >> hive-runtime from the iceberg repo). Main reason: lack of reviewers for >> things other than Spark. >> > >> > Note: need to double check, but I am pretty sure there is no difference >> between Hive `iceberg-catalog` and iceberg's `hive-metastore`, so we could >> potentially drop it from Hive repo and maybe rename to `hive-catalog` in >> iceberg? >> > >> > Supporting one more connector repo seems like an overhead: need to >> setup infra, CI, have active contributors/release managers. Later probably >> is the reason why we still haven't moved HMS into a separate repo. >> > >> > Having iceberg connector in Hive gives us more flexibility and >> ownership of that component, doesn't block an active development. >> > We try to be up-to-date with latest iceberg, but it usually takes some >> time. >> > >> > I'd be glad to hear other opinions. >> > >> > Thanks, >> > Denys >> >