Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Hyukjin Kwon Mon, 04 Feb 2019 12:27:36 -0800

I should check the details and feasiablity by myself but to me it sounds
fine if it doesn't need extra big efforts.


On Tue, 5 Feb 2019, 4:15 am Xiao Li <gatorsm...@gmail.com wrote:

> Yes. When our support/integration with Hive 2.x becomes stable, we can do
> it in Hadoop 2.x profile too, if needed. The whole proposal is to minimize
> the risk and ensure the release stability and quality.
>
> Hyukjin Kwon <gurwls...@gmail.com> 于2019年2月4日周一 下午12:01写道：
>
>> Xiao, to check if I understood correctly, do you mean the below?
>>
>> 1. Use our fork with Hadoop 2.x profile for now, and use Hive 2.x with
>> Hadoop 3.x profile.
>> 2. Make another newer version of thrift server by Hive 2.x(?) in Spark
>> side.
>> 3. Target the transition to Hive 2.x completely and slowly later in the
>> future.
>>
>>
>>
>> 2019년 2월 5일 (화) 오전 1:16, Xiao Li <gatorsm...@gmail.com>님이 작성:
>>
>>> To reduce the impact and risk of upgrading Hive execution JARs, we can
>>> just upgrade the built-in Hive to 2.x when using the profile of Hadoop 3.x.
>>> The support of Hadoop 3 will be still experimental in our next release.
>>> That means, the impact and risk are very minimal for most users who are
>>> still using Hadoop 2.x profile.
>>>
>>> The code changes in Spark thrift server are massive. It is risky and
>>> hard to review. The original code of our Spark thrift server is from
>>> Hive-service 1.2.1. To reduce the risk of the upgrade, we can inline the
>>> new version. In the future, we can completely get rid of the thrift server,
>>> and build our own high-performant JDBC server.
>>>
>>> Does this proposal sound good to you?
>>>
>>> In the last two weeks, Yuming was trying this proposal. Now, he is on
>>> vacation. In China, today is already the lunar New Year. I would not expect
>>> he will reply this email in the next 7 days.
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>>
>>>
>>> Sean Owen <sro...@gmail.com> 于2019年2月4日周一 上午7:56写道：
>>>
>>>> I was unclear from this thread what the objection to these PRs is:
>>>>
>>>> https://github.com/apache/spark/pull/23552
>>>> https://github.com/apache/spark/pull/23553
>>>>
>>>> Would we like to specifically discuss whether to merge these or not? I
>>>> hear support for it, concerns about continuing to support Hive too,
>>>> but I wasn't clear whether those concerns specifically argue against
>>>> these PRs.
>>>>
>>>>
>>>> On Fri, Feb 1, 2019 at 2:03 PM Felix Cheung <felixcheun...@hotmail.com>
>>>> wrote:
>>>> >
>>>> > What’s the update and next step on this?
>>>> >
>>>> > We have real users getting blocked by this issue.
>>>> >
>>>> >
>>>> > ________________________________
>>>> > From: Xiao Li <gatorsm...@gmail.com>
>>>> > Sent: Wednesday, January 16, 2019 9:37 AM
>>>> > To: Ryan Blue
>>>> > Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming
>>>> Wang; dev
>>>> > Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4
>>>> >
>>>> > Thanks for your feedbacks!
>>>> >
>>>> > Working with Yuming to reduce the risk of stability and quality. Will
>>>> keep you posted when the proposal is ready.
>>>> >
>>>> > Cheers,
>>>> >
>>>> > Xiao
>>>> >
>>>> > Ryan Blue <rb...@netflix.com> 于2019年1月16日周三 上午9:27写道：
>>>> >>
>>>> >> +1 for what Marcelo and Hyukjin said.
>>>> >>
>>>> >> In particular, I agree that we can't expect Hive to release a
>>>> version that is now more than 3 years old just to solve a problem for
>>>> Spark. Maybe that would have been a reasonable ask instead of publishing a
>>>> fork years ago, but I think this is now Spark's problem.
>>>> >>
>>>> >> On Tue, Jan 15, 2019 at 9:02 PM Marcelo Vanzin <van...@cloudera.com>
>>>> wrote:
>>>> >>>
>>>> >>> +1 to that. HIVE-16391 by itself means we're giving up things like
>>>> >>> Hadoop 3, and we're also putting the burden on the Hive folks to
>>>> fix a
>>>> >>> problem that we created.
>>>> >>>
>>>> >>> The current PR is basically a Spark-side fix for that bug. It does
>>>> >>> mean also upgrading Hive (which gives us Hadoop 3, yay!), but I
>>>> think
>>>> >>> it's really the right path to take here.
>>>> >>>
>>>> >>> On Tue, Jan 15, 2019 at 6:32 PM Hyukjin Kwon <gurwls...@gmail.com>
>>>> wrote:
>>>> >>> >
>>>> >>> > Resolving HIVE-16391 means Hive to release 1.2.x that contains
>>>> the fixes of our Hive fork (correct me if I am mistaken).
>>>> >>> >
>>>> >>> > Just to be honest by myself and as a personal opinion, that
>>>> basically says Hive to take care of Spark's dependency.
>>>> >>> > Hive looks going ahead for 3.1.x and no one would use the newer
>>>> release of 1.2.x. In practice, Spark doesn't make a release 1.6.x anymore
>>>> for instance,
>>>> >>> >
>>>> >>> > Frankly, my impression was that it's, honestly, our mistake to
>>>> fix. Since Spark community is big enough, I was thinking we should try to
>>>> fix it by ourselves first.
>>>> >>> > I am not saying upgrading is the only way to get through this but
>>>> I think we should at least try first, and see what's next.
>>>> >>> >
>>>> >>> > It does, yes, sound more risky to upgrade it in our side but I
>>>> think it's worth to check and try it and see if it's possible.
>>>> >>> > I think this is a standard approach to upgrade the dependency
>>>> than using the fork or letting Hive side to release another 1.2.x.
>>>> >>> >
>>>> >>> > If we fail to upgrade it for critical or inevitable reasons
>>>> somehow, yes, we could find an alternative but that basically means
>>>> >>> > we're going to stay in 1.2.x for, at least, a long time (say ..
>>>> until Spark 4.0.0?).
>>>> >>> >
>>>> >>> > I know somehow it happened to be sensitive but to be just
>>>> literally honest to myself, I think we should make a try.
>>>> >>> >
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Marcelo
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Ryan Blue
>>>> >> Software Engineer
>>>> >> Netflix
>>>>
>>>

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Reply via email to