I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular projects
including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives
more flexibility to the other projects to shade & relocate those classes
according to their need, without waiting for new Hive releases. Hive also
needs to make sure it relocate everything properly. Otherwise, if some
classes are shaded & included in `hive-exec` but not relocated, there is no
way for the other projects to exclude them and avoid potential conflicts.

Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <k...@rxd.hu> wrote:

> Hey
>
> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> > Indeed this may lead to binary incompatibility problems as the one you
> > mentioned. If I understood correctly the problem you cite comes up if
> > library B in this case is not relocated. If Hive systematically relocates
> > shaded deps do you think there will still be binary incompatibility
> issues?
> >
> > If the relocating solution works, I would personally prefer going down
> this
> > path instead of introducing an entirely new module just for the sake of
> > dependency management. Most of the time when there are problems with
> > shading the answer comes from relocating the problematic dependencies and
> > people are more or less accustomed with this route.
>
> I totally agree with you Stamatis - with the addition that we should work
> together with the owners of other projects to help them use the correct
> artifact to gain access to
> Hive's internal parts.
> I've opened HIVE-25531 to remove the core classified artifact - and ensure
> that we will be uncovering and fixing future issues with the hive-exec
> artifact.
>
> cheers,
> Zoltan
>
>
> >
> > Best,
> > Stamatis
> >
> > On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi
> <fdan...@cloudera.com.invalid>
> > wrote:
> >
> >> Dear Hive developers,
> >>
> >> I am Dan from the Oozie team and I would like to bring up the
> >> hive-exec.jar vs. hive-exec-core.jar topic.
> >> The reason for that is because as far as we understand the official
> >> recommendation from the Hive team is to use the hive-exec.jar artifact.
> >>
> >> However in Oozie that can end-up in a binary incompatibility.
> >>
> >> The reason for that is:
> >>
> >>    * Let's say library A is included in the fat Jar.
> >>
> >>    * And library B which is using library A is also included in the fat
> Jar.
> >>
> >>    * Let's also say that library A's com.library.alib package is
> >>      relocated to org.apache.hive.com.library.alib,
> >>      meaning the com.library.alib.SomeClass becomes
> >>      org.apache.hive.com.library.alib.SomeClass
> >>
> >>    * So if B has a method like public void
> >>      someMethod(com.library.alib.SomeClass) then the signature of this
> >>      method will be changed to:
> >>      public void someMethod(org.apache.hive.com.library.alib.SomeClass)
> >>
> >>    * If Oozie is also using B directly meaning we'll have b.jar on our
> >>      classpath, but with the unchanged signature,
> >>      so when hive-exec tries to invoke someMethod then depending on
> >>      whether b.jar coming from us will be loaded first or hive-exec
> will,
> >>      we can end-up with a NoSuchMethodError is hive-exec tries to pass
> an
> >>      org.apache.hive.com.library.alib.SomeClass instance to the
> >>      someMethod which was loaded from the original b.jar.
> >>
> >> Hence in Oozie a long time ago (OOZIE-2621
> >> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
> >> made to use the hive-exec-core Jar.
> >>
> >> Now since the shading process actually removes those dependencies from
> >> the hive-exec pom which are included in the fat Jar, we manually had to
> >> add some dependencies to Oozie to compensate this.
> >> However these dependencies are not used by Oozie directly and with the
> >> growing features of hive-exec we had to repeat the same process
> >> over-and-over which is a bit unmaintainable.
> >>
> >> Today I'm writing to you to propose a long-term solution where basically
> >> nothing would change in the generated hive artifacts, poms and the same
> >> time we wouldn't have to manually declare dependencies in Oozie which
> >> are not explicitly used by us.
> >>
> >> The solution:
> >>
> >>   1. We would create a new module named hive-exec-dependencies which
> >>      would be a pom-packaging module without any Java source files.
> >>   2. All the dependencies declared in hive-exec would be moved to
> >>      hive-exec-dependencies.
> >>   3. We would make the hive-exec-dependencies module the parent of
> >>      hive-exec and with this hive-exec would still have access to the
> >>      same dependencies as before.
> >>   4. The maven shade plugin would still strip the dependencies from the
> >>      generated hive-exec pom which are included in the fat Jar.
> >>   5. And with a small maven plugin we'd change hive-exec's parent back
> >>      from hive-exec-dependencies to the root hive project in the
> >>      generated hive-exec pom file.
> >>
> >> I have a change ready locally and it works as described above.
> >>
> >> With this on the Oozie side we could add a dependency on
> >> hive-exec-dependencies and hence all the required libraries which are
> >> included in the fat Jar would be pulled into Oozie.
> >> The next time a new dependency would be added to hive-exec-dependencies,
> >> the Oozie build would pull it in automatically without us having to
> >> explicitly declare it.
> >>
> >> Please let me know what you think.
> >>
> >> Best,
> >> Dan
> >>
> >
>

Reply via email to