I should check the details and feasiablity by myself but to me it sounds fine if it doesn't need extra big efforts.
On Tue, 5 Feb 2019, 4:15 am Xiao Li <gatorsm...@gmail.com wrote: > Yes. When our support/integration with Hive 2.x becomes stable, we can do > it in Hadoop 2.x profile too, if needed. The whole proposal is to minimize > the risk and ensure the release stability and quality. > > Hyukjin Kwon <gurwls...@gmail.com> 于2019年2月4日周一 下午12:01写道: > >> Xiao, to check if I understood correctly, do you mean the below? >> >> 1. Use our fork with Hadoop 2.x profile for now, and use Hive 2.x with >> Hadoop 3.x profile. >> 2. Make another newer version of thrift server by Hive 2.x(?) in Spark >> side. >> 3. Target the transition to Hive 2.x completely and slowly later in the >> future. >> >> >> >> 2019년 2월 5일 (화) 오전 1:16, Xiao Li <gatorsm...@gmail.com>님이 작성: >> >>> To reduce the impact and risk of upgrading Hive execution JARs, we can >>> just upgrade the built-in Hive to 2.x when using the profile of Hadoop 3.x. >>> The support of Hadoop 3 will be still experimental in our next release. >>> That means, the impact and risk are very minimal for most users who are >>> still using Hadoop 2.x profile. >>> >>> The code changes in Spark thrift server are massive. It is risky and >>> hard to review. The original code of our Spark thrift server is from >>> Hive-service 1.2.1. To reduce the risk of the upgrade, we can inline the >>> new version. In the future, we can completely get rid of the thrift server, >>> and build our own high-performant JDBC server. >>> >>> Does this proposal sound good to you? >>> >>> In the last two weeks, Yuming was trying this proposal. Now, he is on >>> vacation. In China, today is already the lunar New Year. I would not expect >>> he will reply this email in the next 7 days. >>> >>> Cheers, >>> >>> Xiao >>> >>> >>> >>> Sean Owen <sro...@gmail.com> 于2019年2月4日周一 上午7:56写道: >>> >>>> I was unclear from this thread what the objection to these PRs is: >>>> >>>> https://github.com/apache/spark/pull/23552 >>>> https://github.com/apache/spark/pull/23553 >>>> >>>> Would we like to specifically discuss whether to merge these or not? I >>>> hear support for it, concerns about continuing to support Hive too, >>>> but I wasn't clear whether those concerns specifically argue against >>>> these PRs. >>>> >>>> >>>> On Fri, Feb 1, 2019 at 2:03 PM Felix Cheung <felixcheun...@hotmail.com> >>>> wrote: >>>> > >>>> > What’s the update and next step on this? >>>> > >>>> > We have real users getting blocked by this issue. >>>> > >>>> > >>>> > ________________________________ >>>> > From: Xiao Li <gatorsm...@gmail.com> >>>> > Sent: Wednesday, January 16, 2019 9:37 AM >>>> > To: Ryan Blue >>>> > Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming >>>> Wang; dev >>>> > Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4 >>>> > >>>> > Thanks for your feedbacks! >>>> > >>>> > Working with Yuming to reduce the risk of stability and quality. Will >>>> keep you posted when the proposal is ready. >>>> > >>>> > Cheers, >>>> > >>>> > Xiao >>>> > >>>> > Ryan Blue <rb...@netflix.com> 于2019年1月16日周三 上午9:27写道: >>>> >> >>>> >> +1 for what Marcelo and Hyukjin said. >>>> >> >>>> >> In particular, I agree that we can't expect Hive to release a >>>> version that is now more than 3 years old just to solve a problem for >>>> Spark. Maybe that would have been a reasonable ask instead of publishing a >>>> fork years ago, but I think this is now Spark's problem. >>>> >> >>>> >> On Tue, Jan 15, 2019 at 9:02 PM Marcelo Vanzin <van...@cloudera.com> >>>> wrote: >>>> >>> >>>> >>> +1 to that. HIVE-16391 by itself means we're giving up things like >>>> >>> Hadoop 3, and we're also putting the burden on the Hive folks to >>>> fix a >>>> >>> problem that we created. >>>> >>> >>>> >>> The current PR is basically a Spark-side fix for that bug. It does >>>> >>> mean also upgrading Hive (which gives us Hadoop 3, yay!), but I >>>> think >>>> >>> it's really the right path to take here. >>>> >>> >>>> >>> On Tue, Jan 15, 2019 at 6:32 PM Hyukjin Kwon <gurwls...@gmail.com> >>>> wrote: >>>> >>> > >>>> >>> > Resolving HIVE-16391 means Hive to release 1.2.x that contains >>>> the fixes of our Hive fork (correct me if I am mistaken). >>>> >>> > >>>> >>> > Just to be honest by myself and as a personal opinion, that >>>> basically says Hive to take care of Spark's dependency. >>>> >>> > Hive looks going ahead for 3.1.x and no one would use the newer >>>> release of 1.2.x. In practice, Spark doesn't make a release 1.6.x anymore >>>> for instance, >>>> >>> > >>>> >>> > Frankly, my impression was that it's, honestly, our mistake to >>>> fix. Since Spark community is big enough, I was thinking we should try to >>>> fix it by ourselves first. >>>> >>> > I am not saying upgrading is the only way to get through this but >>>> I think we should at least try first, and see what's next. >>>> >>> > >>>> >>> > It does, yes, sound more risky to upgrade it in our side but I >>>> think it's worth to check and try it and see if it's possible. >>>> >>> > I think this is a standard approach to upgrade the dependency >>>> than using the fork or letting Hive side to release another 1.2.x. >>>> >>> > >>>> >>> > If we fail to upgrade it for critical or inevitable reasons >>>> somehow, yes, we could find an alternative but that basically means >>>> >>> > we're going to stay in 1.2.x for, at least, a long time (say .. >>>> until Spark 4.0.0?). >>>> >>> > >>>> >>> > I know somehow it happened to be sensitive but to be just >>>> literally honest to myself, I think we should make a try. >>>> >>> > >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> Marcelo >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Ryan Blue >>>> >> Software Engineer >>>> >> Netflix >>>> >>>