I'm not sure whether it is a good idea to remove `hive-exec-core` completely - it is still being used today by some other popular projects including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives more flexibility to the other projects to shade & relocate those classes according to their need, without waiting for new Hive releases. Hive also needs to make sure it relocate everything properly. Otherwise, if some classes are shaded & included in `hive-exec` but not relocated, there is no way for the other projects to exclude them and avoid potential conflicts.
Chao On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich <k...@rxd.hu> wrote: > Hey > > On 9/6/21 12:48 PM, Stamatis Zampetakis wrote: > > Indeed this may lead to binary incompatibility problems as the one you > > mentioned. If I understood correctly the problem you cite comes up if > > library B in this case is not relocated. If Hive systematically relocates > > shaded deps do you think there will still be binary incompatibility > issues? > > > > If the relocating solution works, I would personally prefer going down > this > > path instead of introducing an entirely new module just for the sake of > > dependency management. Most of the time when there are problems with > > shading the answer comes from relocating the problematic dependencies and > > people are more or less accustomed with this route. > > I totally agree with you Stamatis - with the addition that we should work > together with the owners of other projects to help them use the correct > artifact to gain access to > Hive's internal parts. > I've opened HIVE-25531 to remove the core classified artifact - and ensure > that we will be uncovering and fixing future issues with the hive-exec > artifact. > > cheers, > Zoltan > > > > > > Best, > > Stamatis > > > > On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi > <fdan...@cloudera.com.invalid> > > wrote: > > > >> Dear Hive developers, > >> > >> I am Dan from the Oozie team and I would like to bring up the > >> hive-exec.jar vs. hive-exec-core.jar topic. > >> The reason for that is because as far as we understand the official > >> recommendation from the Hive team is to use the hive-exec.jar artifact. > >> > >> However in Oozie that can end-up in a binary incompatibility. > >> > >> The reason for that is: > >> > >> * Let's say library A is included in the fat Jar. > >> > >> * And library B which is using library A is also included in the fat > Jar. > >> > >> * Let's also say that library A's com.library.alib package is > >> relocated to org.apache.hive.com.library.alib, > >> meaning the com.library.alib.SomeClass becomes > >> org.apache.hive.com.library.alib.SomeClass > >> > >> * So if B has a method like public void > >> someMethod(com.library.alib.SomeClass) then the signature of this > >> method will be changed to: > >> public void someMethod(org.apache.hive.com.library.alib.SomeClass) > >> > >> * If Oozie is also using B directly meaning we'll have b.jar on our > >> classpath, but with the unchanged signature, > >> so when hive-exec tries to invoke someMethod then depending on > >> whether b.jar coming from us will be loaded first or hive-exec > will, > >> we can end-up with a NoSuchMethodError is hive-exec tries to pass > an > >> org.apache.hive.com.library.alib.SomeClass instance to the > >> someMethod which was loaded from the original b.jar. > >> > >> Hence in Oozie a long time ago (OOZIE-2621 > >> <https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was > >> made to use the hive-exec-core Jar. > >> > >> Now since the shading process actually removes those dependencies from > >> the hive-exec pom which are included in the fat Jar, we manually had to > >> add some dependencies to Oozie to compensate this. > >> However these dependencies are not used by Oozie directly and with the > >> growing features of hive-exec we had to repeat the same process > >> over-and-over which is a bit unmaintainable. > >> > >> Today I'm writing to you to propose a long-term solution where basically > >> nothing would change in the generated hive artifacts, poms and the same > >> time we wouldn't have to manually declare dependencies in Oozie which > >> are not explicitly used by us. > >> > >> The solution: > >> > >> 1. We would create a new module named hive-exec-dependencies which > >> would be a pom-packaging module without any Java source files. > >> 2. All the dependencies declared in hive-exec would be moved to > >> hive-exec-dependencies. > >> 3. We would make the hive-exec-dependencies module the parent of > >> hive-exec and with this hive-exec would still have access to the > >> same dependencies as before. > >> 4. The maven shade plugin would still strip the dependencies from the > >> generated hive-exec pom which are included in the fat Jar. > >> 5. And with a small maven plugin we'd change hive-exec's parent back > >> from hive-exec-dependencies to the root hive project in the > >> generated hive-exec pom file. > >> > >> I have a change ready locally and it works as described above. > >> > >> With this on the Oozie side we could add a dependency on > >> hive-exec-dependencies and hence all the required libraries which are > >> included in the fat Jar would be pulled into Oozie. > >> The next time a new dependency would be added to hive-exec-dependencies, > >> the Oozie build would pull it in automatically without us having to > >> explicitly declare it. > >> > >> Please let me know what you think. > >> > >> Best, > >> Dan > >> > > >