For the Hive runtime, would it be feasible for the Hive community to
contribute a suite of tests to the Iceberg repo that can be run with
dependencies from the latest Hive release (Hive 4.x), and then update them
from time to time as appropriate? The purpose of this suite would be to
test integration of Iceberg core with the Hive runtime. Perhaps the
existing tests in the mr and hive3 modules could be a starting point, or
you might decide on different tests altogether.
The development of the Hive runtime would then continue as now in the Hive
repo, but you gain better assurance of compatibility with ongoing Iceberg
development, with a relatively small maintenance burden in Iceberg.



On Mon, Nov 25, 2024 at 11:56 AM Ayush Saxena <ayush...@gmail.com> wrote:

> Hi Peter,
>
> Thanks for bringing this to our attention.
>
> From my side, I have a say only on the code that resides in the Hive
> repository. I am okay with the first approach, as we are already
> following it for the most part. Whether Iceberg keeps or drops the
> code shouldn’t have much impact on us. (I don't think I have a say on
> that either) That said, it would be helpful if they continue running
> tests against the latest stable Hive releases to ensure that any
> changes don’t unintentionally break something for Hive, which would be
> beyond our control.
>
> Regarding having a separate code repository for the connectors, I
> believe the challenges would outweigh the benefits. As mentioned, the
> initial workload would be significant, but more importantly,
> maintaining a regular cadence of releases would be even more
> difficult. I don’t see a large pool of contributors specifically
> focused on this area who could take ownership and drive releases for a
> single repository. Additionally, the ASF doesn’t officially allow
> repo-level committers or PMC members who could be recruited solely to
> manage one repository. Given these constraints, I suggest dropping
> this idea for now.
>
> Best,
> Ayush
>
> On Tue, 26 Nov 2024 at 01:05, Denys Kuzmenko <dkuzme...@apache.org> wrote:
> >
> > Hi Peter,
> >
> > Thanks for bringing it up!
> >
> > I think that option 1 is the only viable solution here (remove the
> hive-runtime from the iceberg repo). Main reason: lack of reviewers for
> things other than Spark.
> >
> > Note: need to double check, but I am pretty sure there is no difference
> between Hive `iceberg-catalog` and iceberg's `hive-metastore`, so we could
> potentially drop it from Hive repo and maybe rename to `hive-catalog` in
> iceberg?
> >
> > Supporting one more connector repo seems like an overhead: need to setup
> infra, CI, have active contributors/release managers. Later probably is the
> reason why we still haven't moved HMS into a separate repo.
> >
> > Having iceberg connector in Hive gives us more flexibility and ownership
> of that component, doesn't block an active development.
> > We try to be up-to-date with latest iceberg, but it usually takes some
> time.
> >
> > I'd be glad to hear other opinions.
> >
> > Thanks,
> > Denys
>

Reply via email to