[DISCUSS] Re: deprecating MR in the first release of Hive 2.0

Thejas Nair Thu, 22 Oct 2015 14:38:48 -0700

(Adding [DISCUSS] to subject to bring it to attention of wider audience.)

+1 Given how much investment is going into Tez and Spark execution
modes, it makes sense to convey that better to the user community and
recommend the use of the new modes over MR. Users who choose those
modes are going to get better experience, and it will help to improve
the overall perception of Hive.


Once most users have moved to the new modes, we can start looking into
removing MR support. (Though that is likely to take a while).


On Wed, Oct 21, 2015 at 9:44 PM, Sergey Shelukhin
<ser...@hortonworks.com> wrote:
> We have discussed the removal of hadoop-1 and MR support in Hive 2 line in 
> the past..
> Hadoop-1 removal seems to be non-controversial and on track; before we cut 
> the first release of Hive 2, I propose we deprecate MR.
>
> Tez and Spark engines provide vast perf improvements over MR;
> Execution optimization work by most contributors for a long time has been 
> done for these engines and is not portable to MR, so it is languishing 
> further;
> At the same time, supporting additional code has other development costs for 
> new features or bugs, plus we have to run tests for it both in Apache and for 
> local changes and to deploy code.
>
> However, MR is hard to remove. Plus, it may provide a baseline for some bugs 
> in other engines (which is not bulletproof since MR logic can be incorrect), 
> or to mock during perf benchmarks.
>
> Therefore, I propose that for now we add deprecation warnings suggesting the 
> other alternatives:
>
>   *   to Hive configuration documentation.
>   *   to Hive wiki.
>   *   to release notes on Hive 2.
>   *   in Beeline and CLI when using MR.
>
> Additionally, I propose we remove Minimr test driver from HiveQA runs for 
> master.
>
> What do you think?

[DISCUSS] Re: deprecating MR in the first release of Hive 2.0

Reply via email to