Re: Time to Remove Hive-on-Spark

Peter Vary Tue, 12 Apr 2022 07:12:38 -0700

+1 from my side too.

I have created PR against the current branch.
Still needs some work, and as many reviews as possible, because it is quite
big, and I might made some mistakes
https://issues.apache.org/jira/browse/HIVE-26134
https://github.com/apache/hive/pull/3201


Thanks,
Peter

On Thu, 10 Feb 2022 at 17:43, Zoltan Haindrich <[email protected]> wrote:

> Hey,
>
> I think there is no real interest in this feature; we don't have
> users/contributors backing it - last development was around 2018 October;
> there were ~2 bugfix commits ever
> since that...we should stop carrying dead weight...another 2 weeks went by
> since Stamatis have reminded us that after 1.5 years(!) nothing have
> changed.
>
> +1 on removing it
>
> cheers,
> Zoltan
>
> you may inspect some of the recent changes with:
> git log -c `find . -type f -path '**/spark/**'|grep -v xml|grep -v
> properties|grep -v q.out`
>
>
> On 1/28/22 2:32 PM, Stamatis Zampetakis wrote:
> > Hi team,
> >
> > Almost one year has passed since the last exchange in this discussion and
> > if I am not wrong there has been no effort to revive Hive-on-Spark. To be
> > more precise, I don't think I have seen any Spark related JIRA for quite
> > some time now and although I don't want to rush into conclusions, there
> > does not seem to be any community member involved in maintaining or
> adding
> > new features in this part of the code.
> >
> > Keeping dead code in the repository does not do any good to the project
> and
> > puts a non-negligible burden to future maintainers.
> >
> > Clearly, we cannot make a new Hive release where a major feature is
> > completely untested so either someone commits to re-enable/fix the
> > respective tests soon or we move forward the work started by David and
> drop
> > support for Hive-on-Spark.
> >
> > I would like to ask the community if there is anyone who can take up this
> > maintenance task and enable/fix Spark related tests in the next month or
> so?
> >
> > Best,
> > Stamatis
> >
> > On Sat, Feb 27, 2021 at 4:17 AM Edward Capriolo <[email protected]>
> > wrote:
> >
> >> I do not know how it works for most of the world. But in cloudera where
> the
> >> TEZ options were never popular hive-on-spark represents a solid way to
> get
> >> things done for small datasets lower latency.
> >>
> >> As for the spark adoption. You know a while ago I came up with some
> ways to
> >> make hive more  spark like. One of them was a found a way to make
> "compile"
> >> a hive keyword so folks could build UDFs on the fly. It was such an
> >> uphil climb. Folks found a way to make it disabled by default for
> security.
> >> Then later when things moved from CLI to beeline it was like the ONLY
> thing
> >> that I found not ported. Like it was extremely frustrating.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Jul 27, 2020 at 3:19 PM David <[email protected]> wrote:
> >>
> >>> Hello  Xuefu,
> >>>
> >>> I am not part of the Cloudera Hive product team,  though I volunteer to
> >>> work on small projects from time to time.  Perhaps someone from that
> team
> >>> can chime in with some of their thoughts, but personally, I think that
> in
> >>> the long run, there will be more of a merge between Hive-on-Spark and
> >> other
> >>> Spark-native offerings.  I'm not sure what the differentiation will be
> >>> going forward.  With that said, are there any developers on this
> mailing
> >>> list who are willing to take on the maintenance effort of keeping HoS
> >>> moving forward?
> >>>
> >>> http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/
> >>>
> >>>
> >>
> https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/config-sts.html
> >>>
> >>>
> >>> Thanks.
> >>>
> >>> On Thu, Jul 23, 2020 at 12:35 PM Xuefu Zhang <[email protected]> wrote:
> >>>
> >>>> Previous reasoning seemed to suggest a lack of user adoption. Now we
> >> are
> >>>> concerned about ongoing maintenance effort. Both are valid
> >>> considerations.
> >>>> However, I think we should have ways to find out the answers.
> >> Therefore,
> >>> I
> >>>> suggest the following be carried out:
> >>>>
> >>>> 1. Send out the proposal (removing Hive on Spark) to users including
> >>>> [email protected] and get their feedback.
> >>>> 2. Ask if any developers on this mailing list are willing to take on
> >> the
> >>>> maintenance effort.
> >>>>
> >>>> I'm concerned about user impact because I can still see issues being
> >>>> reported on HoS from time to time. I'm more concerned about the future
> >> of
> >>>> Hive if we narrow Hive neutrality on execution engines, which will
> >>> possibly
> >>>> force more Hive users to migrate to other alternatives such as Spark
> >> SQL,
> >>>> which is already eroding Hive's user base.
> >>>>
> >>>> Being open and neutral used to be Hive's most admired strengths.
> >>>>
> >>>> Thanks,
> >>>> Xuefu
> >>>>
> >>>>
> >>>> On Wed, Jul 22, 2020 at 8:46 AM Alan Gates <[email protected]>
> >> wrote:
> >>>>
> >>>>> An important point here is I don't believe David is proposing to
> >> remove
> >>>>> Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing
> >>> to
> >>>>> support it in existing 2 and 3 lines makes sense, but since no one
> >> has
> >>>>> maintained it on trunk for some time and it does not work with many
> >> of
> >>>> the
> >>>>> newer features it should be removed from trunk.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>> On Tue, Jul 21, 2020 at 4:10 PM Chao Sun <[email protected]> wrote:
> >>>>>
> >>>>>> Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a
> >>>> very
> >>>>>> large scale in production right now and I don't think we have any
> >>> plan
> >>>> to
> >>>>>> change it soon.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jul 21, 2020 at 11:28 AM David <[email protected]> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> Thanks for the feedback.
> >>>>>>>
> >>>>>>> Just a quick recap: I did propose this @dev and I received
> >>> unanimous
> >>>>> +1's
> >>>>>>> from the community.  After a couple months, I created the PR.
> >>>>>>>
> >>>>>>> Certainly open to discussion, but there hasn't been any
> >> discussion
> >>>> thus
> >>>>>> far
> >>>>>>> because there have been no objections until this point.
> >>>>>>>
> >>>>>>> HoS has low adoption, heavy technical debt, and the manner in
> >> which
> >>>> its
> >>>>>>> build process is setup is impeding some other work that is not
> >> even
> >>>>>> related
> >>>>>>> to HoS.
> >>>>>>>
> >>>>>>> We can deprecate in Hive 3.x and remove in Hive 4.x.  The plan
> >>> would
> >>>> be
> >>>>>> to
> >>>>>>> use Tez moving forward.
> >>>>>>>
> >>>>>>> My point about the vendor's move to Tez is that HoS adoption is
> >>> very
> >>>>> low,
> >>>>>>> it's only going lower, and while I don't know the specifics of
> >> it,
> >>>>> there
> >>>>>>> must be some migration plan in place there (i.e., it must be
> >>> possible
> >>>>> to
> >>>>>> do
> >>>>>>> it already).
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> David
> >>>>>>>
> >>>>>>> On Tue, Jul 21, 2020 at 12:23 PM Xuefu Zhang <[email protected]>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi David,
> >>>>>>>>
> >>>>>>>> While a vendor may not support a component in an open source
> >>>> project,
> >>>>>>>> removing it or not is a decision by and for the community. I
> >>>>> certainly
> >>>>>>>> understand that the vendor you mentioned has contributed a
> >> great
> >>>> deal
> >>>>>>>> (including my personal effort while working there), it's not up
> >>> to
> >>>>> the
> >>>>>>>> vendor to make a call like what is proposed here.
> >>>>>>>>
> >>>>>>>> As a community, we should have gone through a thorough
> >> discussion
> >>>> and
> >>>>>>>> reached a consensus before actually making such a big change,
> >> in
> >>> my
> >>>>>>>> opinion.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Xuefu
> >>>>>>>>
> >>>>>>>> On Tue, Jul 21, 2020 at 8:49 AM David <[email protected]>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> Hey,
> >>>>>>>>>
> >>>>>>>>> Thanks for the input.
> >>>>>>>>>
> >>>>>>>>> FYI. Cloudera (Cloudera + Hortonworks) have removed HoS from
> >>>> their
> >>>>>>> latest
> >>>>>>>>> offering.
> >>>>>>>>>
> >>>>>>>>> "Tez is now the only supported execution engine, existing
> >>> queries
> >>>>>> that
> >>>>>>>>> change execution mode to Spark or MapReduce within a session,
> >>> for
> >>>>>>>> example,
> >>>>>>>>> fail."
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://docs.cloudera.com/cdp/latest/upgrade-post/topics/ug_hive_configuration_changes.html
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So I don't know who will be supporting this feature moving
> >>>> forward,
> >>>>>> but
> >>>>>>>>> there has been a lot of work done to make this change as
> >>> painless
> >>>>> as
> >>>>>>>>> possible.  Simply set the engine to 'tez' and remove the
> >>>>> HoS-related
> >>>>>>>>> settings should address many use cases.
> >>>>>>>>>
> >>>>>>>>> Thanks.
> >>>>>>>>>
> >>>>>>>>> On Tue, Jul 21, 2020 at 11:36 AM Xuefu Z <[email protected]>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Sorry for chiming in late. However, I don't think we should
> >>>>> remove
> >>>>>>> Hive
> >>>>>>>>> on
> >>>>>>>>>> Spark just because of a technical problem. This is rather a
> >>> big
> >>>>>>>> decision
> >>>>>>>>>> that we need to be careful about. There are users that will
> >>> be
> >>>>> left
> >>>>>>>> high
> >>>>>>>>>> and dry by this move.
> >>>>>>>>>>
> >>>>>>>>>> If the community decides to desupport and eventually remove
> >>>> it, I
> >>>>>>> think
> >>>>>>>>> we
> >>>>>>>>>> need to have a due process. We also need a deprecation plan
> >>> if
> >>>>>> that's
> >>>>>>>> we
> >>>>>>>>>> decide to do. Before that, I'm -1 on this proposal.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Xuefu
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Jul 21, 2020 at 7:57 AM David <[email protected]>
> >>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hello Team,
> >>>>>>>>>>>
> >>>>>>>>>>> https://github.com/apache/hive/pull/1285
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jun 3, 2020 at 11:49 PM Gopal V <
> >> [email protected]
> >>>>
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> +1
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Gopal
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 6/3/20 7:48 PM, Jesus Camacho Rodriguez wrote:
> >>>>>>>>>>>>> +1
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jesús
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jun 3, 2020 at 1:58 PM Alan Gates <
> >>>>>>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +1.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Alan.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
> >>>>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan <
> >>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jun 3, 2020 at 1:23 PM David Mollitor <
> >>>>>>>>> [email protected]>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hello Gang,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have spent some time working on upgrading Avro
> >>> (far
> >>>>>> less
> >>>>>>>> than
> >>>>>>>>>>>>>> others):
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HIVE-21737
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This should be a relatively easy thing to do, but
> >>> is
> >>>>>>> blocked
> >>>>>>>> by
> >>>>>>>>>>>>>>>>> Hive-on-Spark.  HoS has a weird thing where it
> >>>>> downloads
> >>>>>>> some
> >>>>>>>>>>>>>>>>> cloud-storage-hosted file of Spark-Hadoop as part
> >>> of
> >>>>> its
> >>>>>>>> maven
> >>>>>>>>>> run.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Since HoS is not going to receive updates from
> >> the
> >>>>> major
> >>>>>>>>> vendors,
> >>>>>>>>>>> is
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> time to simply remove it?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Tests are currently disabled:
> >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HIVE-23137
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>
> >>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: Time to Remove Hive-on-Spark

Reply via email to