Previous reasoning seemed to suggest a lack of user adoption. Now we are concerned about ongoing maintenance effort. Both are valid considerations. However, I think we should have ways to find out the answers. Therefore, I suggest the following be carried out:
1. Send out the proposal (removing Hive on Spark) to users including u...@hive.apache.org and get their feedback. 2. Ask if any developers on this mailing list are willing to take on the maintenance effort. I'm concerned about user impact because I can still see issues being reported on HoS from time to time. I'm more concerned about the future of Hive if we narrow Hive neutrality on execution engines, which will possibly force more Hive users to migrate to other alternatives such as Spark SQL, which is already eroding Hive's user base. Being open and neutral used to be Hive's most admired strengths. Thanks, Xuefu On Wed, Jul 22, 2020 at 8:46 AM Alan Gates <alanfga...@gmail.com> wrote: > An important point here is I don't believe David is proposing to remove > Hive on Spark from the 2 or 3 lines, but only from trunk. Continuing to > support it in existing 2 and 3 lines makes sense, but since no one has > maintained it on trunk for some time and it does not work with many of the > newer features it should be removed from trunk. > > Alan. > > On Tue, Jul 21, 2020 at 4:10 PM Chao Sun <sunc...@apache.org> wrote: > > > Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a very > > large scale in production right now and I don't think we have any plan to > > change it soon. > > > > > > > > On Tue, Jul 21, 2020 at 11:28 AM David <dam6...@gmail.com> wrote: > > > > > Hello, > > > > > > Thanks for the feedback. > > > > > > Just a quick recap: I did propose this @dev and I received unanimous > +1's > > > from the community. After a couple months, I created the PR. > > > > > > Certainly open to discussion, but there hasn't been any discussion thus > > far > > > because there have been no objections until this point. > > > > > > HoS has low adoption, heavy technical debt, and the manner in which its > > > build process is setup is impeding some other work that is not even > > related > > > to HoS. > > > > > > We can deprecate in Hive 3.x and remove in Hive 4.x. The plan would be > > to > > > use Tez moving forward. > > > > > > My point about the vendor's move to Tez is that HoS adoption is very > low, > > > it's only going lower, and while I don't know the specifics of it, > there > > > must be some migration plan in place there (i.e., it must be possible > to > > do > > > it already). > > > > > > Thanks, > > > David > > > > > > On Tue, Jul 21, 2020 at 12:23 PM Xuefu Zhang <xu...@apache.org> wrote: > > > > > > > Hi David, > > > > > > > > While a vendor may not support a component in an open source project, > > > > removing it or not is a decision by and for the community. I > certainly > > > > understand that the vendor you mentioned has contributed a great deal > > > > (including my personal effort while working there), it's not up to > the > > > > vendor to make a call like what is proposed here. > > > > > > > > As a community, we should have gone through a thorough discussion and > > > > reached a consensus before actually making such a big change, in my > > > > opinion. > > > > > > > > Thanks, > > > > Xuefu > > > > > > > > On Tue, Jul 21, 2020 at 8:49 AM David <dam6...@gmail.com> wrote: > > > > > > > > > Hey, > > > > > > > > > > Thanks for the input. > > > > > > > > > > FYI. Cloudera (Cloudera + Hortonworks) have removed HoS from their > > > latest > > > > > offering. > > > > > > > > > > "Tez is now the only supported execution engine, existing queries > > that > > > > > change execution mode to Spark or MapReduce within a session, for > > > > example, > > > > > fail." > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.cloudera.com/cdp/latest/upgrade-post/topics/ug_hive_configuration_changes.html > > > > > > > > > > > > > > > So I don't know who will be supporting this feature moving forward, > > but > > > > > there has been a lot of work done to make this change as painless > as > > > > > possible. Simply set the engine to 'tez' and remove the > HoS-related > > > > > settings should address many use cases. > > > > > > > > > > Thanks. > > > > > > > > > > On Tue, Jul 21, 2020 at 11:36 AM Xuefu Z <usxu...@gmail.com> > wrote: > > > > > > > > > > > Sorry for chiming in late. However, I don't think we should > remove > > > Hive > > > > > on > > > > > > Spark just because of a technical problem. This is rather a big > > > > decision > > > > > > that we need to be careful about. There are users that will be > left > > > > high > > > > > > and dry by this move. > > > > > > > > > > > > If the community decides to desupport and eventually remove it, I > > > think > > > > > we > > > > > > need to have a due process. We also need a deprecation plan if > > that's > > > > we > > > > > > decide to do. Before that, I'm -1 on this proposal. > > > > > > > > > > > > Thanks, > > > > > > Xuefu > > > > > > > > > > > > On Tue, Jul 21, 2020 at 7:57 AM David <dam6...@gmail.com> wrote: > > > > > > > > > > > > > Hello Team, > > > > > > > > > > > > > > https://github.com/apache/hive/pull/1285 > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > On Wed, Jun 3, 2020 at 11:49 PM Gopal V <gop...@apache.org> > > wrote: > > > > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Gopal > > > > > > > > > > > > > > > > On 6/3/20 7:48 PM, Jesus Camacho Rodriguez wrote: > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > -Jesús > > > > > > > > > > > > > > > > > > On Wed, Jun 3, 2020 at 1:58 PM Alan Gates < > > > alanfga...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> +1. > > > > > > > > >> > > > > > > > > >> Alan. > > > > > > > > >> > > > > > > > > >> On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran > > > > > > > > >> <pjayachand...@cloudera.com.invalid> wrote: > > > > > > > > >> > > > > > > > > >>> +1 > > > > > > > > >>> > > > > > > > > >>>> On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan < > > > > > > hashut...@apache.org> > > > > > > > > >>> wrote: > > > > > > > > >>>> > > > > > > > > >>>> +1 > > > > > > > > >>>> > > > > > > > > >>>> On Wed, Jun 3, 2020 at 1:23 PM David Mollitor < > > > > > dam6...@gmail.com> > > > > > > > > >> wrote: > > > > > > > > >>>> > > > > > > > > >>>>> Hello Gang, > > > > > > > > >>>>> > > > > > > > > >>>>> I have spent some time working on upgrading Avro (far > > less > > > > than > > > > > > > > >> others): > > > > > > > > >>>>> > > > > > > > > >>>>> https://issues.apache.org/jira/browse/HIVE-21737 > > > > > > > > >>>>> > > > > > > > > >>>>> This should be a relatively easy thing to do, but is > > > blocked > > > > by > > > > > > > > >>>>> Hive-on-Spark. HoS has a weird thing where it > downloads > > > some > > > > > > > > >>>>> cloud-storage-hosted file of Spark-Hadoop as part of > its > > > > maven > > > > > > run. > > > > > > > > >>>>> > > > > > > > > >>>>> Since HoS is not going to receive updates from the > major > > > > > vendors, > > > > > > > is > > > > > > > > >> it > > > > > > > > >>>>> time to simply remove it? > > > > > > > > >>>>> > > > > > > > > >>>>> Tests are currently disabled: > > > > > > > > >>>>> https://issues.apache.org/jira/browse/HIVE-23137 > > > > > > > > >>>>> > > > > > > > > >>>>> Thanks. > > > > > > > > >>>>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Xuefu Zhang > > > > > > > > > > > > "In Honey We Trust!" > > > > > > > > > > > > > > > > > > > > >