Re: Re: Re: Running Hive4 in low-version Hadoop environments.

Stamatis Zampetakis Tue, 12 Nov 2024 00:20:34 -0800

Hi Lisoda,

I just gave you permissions to modify wiki. Please check that everything
works for you and let us know if you encounter any issues.


Best,
Stamatis

On Tue, Nov 12, 2024 at 2:56 AM lisoda <lis...@yeah.net> wrote:

> Hello Sir.
> I've checked my permissions and I still don't have write access to HIVE
> space.
> If you have time, can you help me to open the WIKI permission?
> Thank you.
>
> Lisoda.
>
>
>
>
>
> 在 2024-10-24 21:25:28，"Ayush Saxena" <ayush...@gmail.com> 写道：
>
> You can submit a request for account here:
> https://selfserve.apache.org/confluence-account.html
>
> Please add some relevant information while submitting the request so that
> I can identify it is you & approve. Once approved let us know, anyone
> amongst us will grant you write access to the hive space for that account.
>
> -Ayush
>
> On Thu, 24 Oct 2024 at 12:16, lisoda <lis...@yeah.net> wrote:
>
>> Hello Ayush.
>>
>> It looks like I don't have access to the wiki(I tried to log in using
>> jira's account), and I can't find an entry point on the page to request an
>> account.Can you tell me how to apply for an account?
>> Also, if I am unable to apply for an account, how do I provide the
>> relevant information?
>>
>> Best
>> Lisoda
>>
>>
>>
>>
>>
>> 在 2024-10-22 13:01:41，"Ayush Saxena" <ayush...@gmail.com> 写道：
>>
>> Sorry for coming back late, I don’t think there should be any problem in
>> this approach if things are working fine..
>>
>> I think it doesn’t require any code changes, Do you plan to contribute
>> the steps or details around the approach via wiki or so, If yes you can
>> share the details of your wiki id if you already have one & I can give you
>> the permissions for the Hive space.
>>
>> I started some initiative [1] for documentation for 4.0.x, though I
>> couldn’t spend enough time on that, but a page under Installing Hive could
>> be a good place to keep this or let me know if you or other folks following
>> have any other idea or plans
>>
>> -Ayush
>>
>>
>> [1] https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0
>>
>> On 12 Oct 2024, at 8:57 PM, lisoda <lis...@yeah.net> wrote:
>>
>> 
>> Hello Sir.
>> I agree with your comments related to hadoop2, and I don't actually
>> intend to support it.We just need to support hadoop 3.x and we're good to
>> go.
>>
>> On a low version of hadoop3, this is what we do:
>> 1. Download the hadoop binaries separately(high version,example:3.3.6),
>> and set hadoop_home in hive to be the directory where the higher version of
>> hadoop is stored.
>> 2. Package tez with all the dependencies and native lib (including the
>> required ones for hadoop).
>> 3. In tez-site.xml.Specify that tez will only use all the jar packages in
>> its own lib folder, and not any hadoop related dependencies in the cluster.
>>
>> With the above steps, we are currently running hive4.0.1+tez0.10.4 on hdp
>> 3.1.0(hadoop 3.1.1). They work fine.
>>
>> This is the solution we are currently using, do you see any problems with
>> this solution? If there are no problems with this solution, can we extend
>> it to all hive's users?
>>
>> Tks.
>> LiSoDa.
>>
>>
>>
>> 在 2024-10-12 14:27:32，"Ayush Saxena" <ayush...@gmail.com> 写道：
>>
>> If you already have a solution in place, feel free to create a Jira & PR
>> with it. However, third-party dependencies present significant challenges.
>> Different versions of Hadoop bring their own set of third-party libraries,
>> which can cause compatibility issues with the versions used by Hive. A
>> prime example is Guava: while Hadoop upgraded Guava in versions post-3.1.x,
>> Hive couldn’t follow suit. Hadoop eventually shaded Guava in 3.3.x, which
>> is why we aligned with that version.
>>
>> One potential improvement could be to switch to using hadoop-client-api,
>> hadoop-client-runtime, and hadoop-client-minicluster instead of directly
>> specifying the Hadoop dependencies. These artifacts shade most of the
>> third-party libraries, which may help minimize conflicts. Spark, for
>> example, already uses them [1].
>>
>> As for releasing separate binaries for different Hadoop versions, I don't
>> think that’s feasible. However, users are free to build their own versions
>> from the source tarball we provide, using -Dhadoop.version=X. The actual
>> release is the source code; the binaries are just convenience binaries
>>
>> That said, I don’t believe supporting the 2.x Hadoop line would be easy,
>> or even possible, at this point, but we can attempt for 3.x maybe
>>
>> -Ayush
>>
>> [1]
>> https://github.com/apache/spark/blob/6734d4883e76b82249df5c151d42bc83173f4122/pom.xml#L1401-L1424
>>
>> On Wed, 9 Oct 2024 at 17:32, lisoda <lis...@yeah.net> wrote:
>>
>>> HI TEAM.
>>>
>>> I would like to discuss with everyone the issue of running Hive4 in
>>> Hadoop environments below version 3.3.6. Currently, a large number of Hive
>>> users are still using low-version environments such as Hadoop 2.6/2.7/
>>> 3.1.1. To be honest, upgrading Hadoop is a challenging task. We cannot
>>> force users to upgrade their Hadoop cluster versions just to use Hive4. In
>>> order to encourage these potential users to adopt and use Hive4, we need to
>>> provide a general solution that allows Hive4 to run on low-version Hadoop
>>> (at least we need to address the compatibility issues with Hadoop version
>>> 3.1.0).
>>> The general plan is as follows: In both the Hive and Tez projects, in
>>> addition to providing the existing tar packages, we should also provide tar
>>> packages that include high-version Hadoop dependencies. By defining
>>> configuration files, users can avoid using any jar package dependencies
>>> from the Hadoop cluster. In this way, users can initiate Tez tasks on
>>> low-version Hadoop clusters using only the built-in Hadoop dependencies.
>>> This is how Spark does it, which is also the main reason why users are
>>> more likely to adopt Spark as a SQL engine. Spark not only provides tar
>>> packages without Hadoop dependencies but also provides tar packages with
>>> built-in Hadoop 3 and Hadoop 2. Users can upgrade to a new version of Spark
>>> without upgrading the Hadoop version.
>>> We have implemented such a plan in our production environment, and we
>>> have successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment.
>>> They are currently working well.
>>> Based on our successful experience, I believe it is necessary for us to
>>> provide tar packages with all Hadoop dependencies built in. At the very
>>> least, we should document that users can successfully run Hive4 on
>>> low-version Hadoop in this way.
>>> However, my idea may not be mature enough, so I would like to know what
>>> others think. It would be great if someone could participate in this topic
>>> and discuss it.
>>>
>>>
>>> TKS.
>>> LISODA.
>>>
>>>

Re: Re: Re: Running Hive4 in low-version Hadoop environments.

Reply via email to