If you already have a solution in place, feel free to create a Jira & PR with it. However, third-party dependencies present significant challenges. Different versions of Hadoop bring their own set of third-party libraries, which can cause compatibility issues with the versions used by Hive. A prime example is Guava: while Hadoop upgraded Guava in versions post-3.1.x, Hive couldn’t follow suit. Hadoop eventually shaded Guava in 3.3.x, which is why we aligned with that version.
One potential improvement could be to switch to using hadoop-client-api, hadoop-client-runtime, and hadoop-client-minicluster instead of directly specifying the Hadoop dependencies. These artifacts shade most of the third-party libraries, which may help minimize conflicts. Spark, for example, already uses them [1]. As for releasing separate binaries for different Hadoop versions, I don't think that’s feasible. However, users are free to build their own versions from the source tarball we provide, using -Dhadoop.version=X. The actual release is the source code; the binaries are just convenience binaries That said, I don’t believe supporting the 2.x Hadoop line would be easy, or even possible, at this point, but we can attempt for 3.x maybe -Ayush [1] https://github.com/apache/spark/blob/6734d4883e76b82249df5c151d42bc83173f4122/pom.xml#L1401-L1424 On Wed, 9 Oct 2024 at 17:32, lisoda <lis...@yeah.net> wrote: > HI TEAM. > > I would like to discuss with everyone the issue of running Hive4 in Hadoop > environments below version 3.3.6. Currently, a large number of Hive users > are still using low-version environments such as Hadoop 2.6/2.7/3.1.1. To > be honest, upgrading Hadoop is a challenging task. We cannot force users to > upgrade their Hadoop cluster versions just to use Hive4. In order to > encourage these potential users to adopt and use Hive4, we need to provide > a general solution that allows Hive4 to run on low-version Hadoop (at least > we need to address the compatibility issues with Hadoop version 3.1.0). > The general plan is as follows: In both the Hive and Tez projects, in > addition to providing the existing tar packages, we should also provide tar > packages that include high-version Hadoop dependencies. By defining > configuration files, users can avoid using any jar package dependencies > from the Hadoop cluster. In this way, users can initiate Tez tasks on > low-version Hadoop clusters using only the built-in Hadoop dependencies. > This is how Spark does it, which is also the main reason why users are > more likely to adopt Spark as a SQL engine. Spark not only provides tar > packages without Hadoop dependencies but also provides tar packages with > built-in Hadoop 3 and Hadoop 2. Users can upgrade to a new version of Spark > without upgrading the Hadoop version. > We have implemented such a plan in our production environment, and we have > successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment. They > are currently working well. > Based on our successful experience, I believe it is necessary for us to > provide tar packages with all Hadoop dependencies built in. At the very > least, we should document that users can successfully run Hive4 on > low-version Hadoop in this way. > However, my idea may not be mature enough, so I would like to know what > others think. It would be great if someone could participate in this topic > and discuss it. > > > TKS. > LISODA. > >