Re: Re: Running Hive4 in low-version Hadoop environments.

Ayush Saxena Thu, 24 Oct 2024 06:25:50 -0700

You can submit a request for account here:
https://selfserve.apache.org/confluence-account.html


Please add some relevant information while submitting the request so that I
can identify it is you & approve. Once approved let us know, anyone amongst
us will grant you write access to the hive space for that account.

-Ayush

On Thu, 24 Oct 2024 at 12:16, lisoda <lis...@yeah.net> wrote:

> Hello Ayush.
>
> It looks like I don't have access to the wiki(I tried to log in using
> jira's account), and I can't find an entry point on the page to request an
> account.Can you tell me how to apply for an account?
> Also, if I am unable to apply for an account, how do I provide the
> relevant information?
>
> Best
> Lisoda
>
>
>
>
>
> 在 2024-10-22 13:01:41，"Ayush Saxena" <ayush...@gmail.com> 写道：
>
> Sorry for coming back late, I don’t think there should be any problem in
> this approach if things are working fine..
>
> I think it doesn’t require any code changes, Do you plan to contribute the
> steps or details around the approach via wiki or so, If yes you can share
> the details of your wiki id if you already have one & I can give you the
> permissions for the Hive space.
>
> I started some initiative [1] for documentation for 4.0.x, though I
> couldn’t spend enough time on that, but a page under Installing Hive could
> be a good place to keep this or let me know if you or other folks following
> have any other idea or plans
>
> -Ayush
>
>
> [1] https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0
>
> On 12 Oct 2024, at 8:57 PM, lisoda <lis...@yeah.net> wrote:
>
> 
> Hello Sir.
> I agree with your comments related to hadoop2, and I don't actually intend
> to support it.We just need to support hadoop 3.x and we're good to go.
>
> On a low version of hadoop3, this is what we do:
> 1. Download the hadoop binaries separately(high version,example:3.3.6),
> and set hadoop_home in hive to be the directory where the higher version of
> hadoop is stored.
> 2. Package tez with all the dependencies and native lib (including the
> required ones for hadoop).
> 3. In tez-site.xml.Specify that tez will only use all the jar packages in
> its own lib folder, and not any hadoop related dependencies in the cluster.
>
> With the above steps, we are currently running hive4.0.1+tez0.10.4 on hdp
> 3.1.0(hadoop 3.1.1). They work fine.
>
> This is the solution we are currently using, do you see any problems with
> this solution? If there are no problems with this solution, can we extend
> it to all hive's users?
>
> Tks.
> LiSoDa.
>
>
>
> 在 2024-10-12 14:27:32，"Ayush Saxena" <ayush...@gmail.com> 写道：
>
> If you already have a solution in place, feel free to create a Jira & PR
> with it. However, third-party dependencies present significant challenges.
> Different versions of Hadoop bring their own set of third-party libraries,
> which can cause compatibility issues with the versions used by Hive. A
> prime example is Guava: while Hadoop upgraded Guava in versions post-3.1.x,
> Hive couldn’t follow suit. Hadoop eventually shaded Guava in 3.3.x, which
> is why we aligned with that version.
>
> One potential improvement could be to switch to using hadoop-client-api,
> hadoop-client-runtime, and hadoop-client-minicluster instead of directly
> specifying the Hadoop dependencies. These artifacts shade most of the
> third-party libraries, which may help minimize conflicts. Spark, for
> example, already uses them [1].
>
> As for releasing separate binaries for different Hadoop versions, I don't
> think that’s feasible. However, users are free to build their own versions
> from the source tarball we provide, using -Dhadoop.version=X. The actual
> release is the source code; the binaries are just convenience binaries
>
> That said, I don’t believe supporting the 2.x Hadoop line would be easy,
> or even possible, at this point, but we can attempt for 3.x maybe
>
> -Ayush
>
> [1]
> https://github.com/apache/spark/blob/6734d4883e76b82249df5c151d42bc83173f4122/pom.xml#L1401-L1424
>
> On Wed, 9 Oct 2024 at 17:32, lisoda <lis...@yeah.net> wrote:
>
>> HI TEAM.
>>
>> I would like to discuss with everyone the issue of running Hive4 in
>> Hadoop environments below version 3.3.6. Currently, a large number of Hive
>> users are still using low-version environments such as Hadoop 2.6/2.7/
>> 3.1.1. To be honest, upgrading Hadoop is a challenging task. We cannot
>> force users to upgrade their Hadoop cluster versions just to use Hive4. In
>> order to encourage these potential users to adopt and use Hive4, we need to
>> provide a general solution that allows Hive4 to run on low-version Hadoop
>> (at least we need to address the compatibility issues with Hadoop version
>> 3.1.0).
>> The general plan is as follows: In both the Hive and Tez projects, in
>> addition to providing the existing tar packages, we should also provide tar
>> packages that include high-version Hadoop dependencies. By defining
>> configuration files, users can avoid using any jar package dependencies
>> from the Hadoop cluster. In this way, users can initiate Tez tasks on
>> low-version Hadoop clusters using only the built-in Hadoop dependencies.
>> This is how Spark does it, which is also the main reason why users are
>> more likely to adopt Spark as a SQL engine. Spark not only provides tar
>> packages without Hadoop dependencies but also provides tar packages with
>> built-in Hadoop 3 and Hadoop 2. Users can upgrade to a new version of Spark
>> without upgrading the Hadoop version.
>> We have implemented such a plan in our production environment, and we
>> have successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment.
>> They are currently working well.
>> Based on our successful experience, I believe it is necessary for us to
>> provide tar packages with all Hadoop dependencies built in. At the very
>> least, we should document that users can successfully run Hive4 on
>> low-version Hadoop in this way.
>> However, my idea may not be mature enough, so I would like to know what
>> others think. It would be great if someone could participate in this topic
>> and discuss it.
>>
>>
>> TKS.
>> LISODA.
>>
>>

Re: Re: Running Hive4 in low-version Hadoop environments.

Reply via email to