Re: Usage of Hadoop 2.2.0

Maximilian Michels Fri, 04 Sep 2015 02:05:08 -0700

+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
release is hardly used and complicates the important high-availability
changes in Flink.


On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <se...@apache.org> wrote:
> I am good with that as well. Mind that we are not only dropping a binary
> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>
>
>
> Lets also reconfigure Travis to test
>
>  - Hadoop1
>  - Hadoop 2.3
>  - Hadoop 2.4
>  - Hadoop 2.6
>  - Hadoop 2.7
>
>
> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <chiwanp...@apache.org> wrote:
>>
>> +1 for dropping Hadoop 2.2.0
>>
>> Regards,
>> Chiwan Park
>>
>> > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <u...@apache.org> wrote:
>> >
>> > +1 to what Robert said.
>> >
>> > On Thursday, September 3, 2015, Robert Metzger <rmetz...@apache.org>
>> > wrote:
>> > I think most cloud providers moved beyond Hadoop 2.2.0.
>> > Google's Click-To-Deploy is on 2.4.1
>> > AWS EMR is on 2.6.0
>> >
>> > The situation for the distributions seems to be the following:
>> > MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>> > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>> >
>> > HDP 2.0  (October 2013) is using 2.2.0
>> > HDP 2.1 (April 2014) uses 2.4.0 already
>> >
>> > So both vendors and cloud providers are multiple releases away from
>> > Hadoop 2.2.0.
>> >
>> > Spark does not offer a binary distribution lower than 2.3.0.
>> >
>> > In addition to that, I don't think that the HDFS client in 2.2.0 is
>> > really usable in production environments. Users were reporting
>> > ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>> > sometimes.
>> >
>> > The easiest approach  to resolve this issue would be  (a) dropping the
>> > support for Hadoop 2.2.0
>> > An alternative approach (b) would be:
>> >  - ship a binary version for Hadoop 2.3.0
>> >  - make the source of Flink still compatible with 2.2.0, so that users
>> > can compile a Hadoop 2.2.0 version if needed.
>> >
>> > I would vote for approach (a).
>> >
>> >
>> > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <trohrm...@apache.org>
>> > wrote:
>> > While working on high availability (HA) for Flink's YARN execution I
>> > stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>> > 2.3.0, Hadoop introduced new functionality which is required for an
>> > efficient HA implementation. Therefore, I was wondering whether there is
>> > actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively 
>> > used
>> > by someone?
>> >
>> > Cheers,
>> > Till
>> >
>>
>>
>>
>>
>>
>

Re: Usage of Hadoop 2.2.0

Reply via email to